4 Requirements for Self-Service Big Data Analytics
The new generation big data analytics solutions has come a long way in the last five years. They are getting deployed by enterprises where they are either complimenting or incrementally replacing their traditional analytics platforms for the data warehouse.
The big data analytics projects are increasingly driven by business users as they look for faster decisioning to create business value for enterprises in this dynamic, highly competitive and customer-centric world. I would like to call these business users, ‘the Decision Scientists’ as they apply domain knowledge and experience to make sense of insights discovered by data scientists and machine learning algorithms in order to make right decisions based on given context and business objective. They are not merely decision takers as they like to dig deeper into the data and insights and play with it before taking decisions.
The current trend in the industry is to add a data discovery framework on top of big data analytics platforms to enable even the Decision Scientists to experiment with data or sometimes insights to arrive at better and deeper intelligence while significantly reducing the time taken to get at those decisions. Data Scientists and Analysts would still be required to create algorithms, analytics models and event parsers, however these would be packaged for data discovery, with configurable input data sources, event parsing logic, KPIs and model parameters. Above all, data and IT governance should also be simplified.
These new capabilities require the following key changes in the functionality of contemporary big data analytics platforms:
1. Big data platforms have to evolve into self-service tools:A big data platform should allow agility, fast time to decisions and an increased productivity for Decision Scientist. To achieve that it has to be self-service. Apart from being robust, the platform should abstract all technology layers from the business user. Data models, technology stack, and analytics algorithms should be abstracted and available to the business users as a service or configurable module. For instance, all data integration workflows and analytics algorithms should be packaged into customizable models. Decision Scientists should themselves be able to customize the inputs to these models and improve performance and results by fine tuning model parameters. Decision Scientists should themselves be able to configure new data sources with little external help, view a catalog of integrated data rather than the data model, experiment with different packaged analytics algorithms and visualization for data discovery and then create dashboards and visual reports.
2. Data governance needs to have more focus than day to day operations:Managing data manually is difficult and is not a scalable option, especially for big data sets. Decisions like what data has to be retained and for how long, what data has to be shelved, what level of access is allowed to users in the chain from data to decisions, accounting for data usage etc. continue to be based on manually configured rules, but bulk of the processes should be automated. Data quality control and correction is another important aspect that should be automated. A full automation might not be always possible, a Data Steward will have to certify data quality or take a decision/action occasionally when an alarm is raised, but the process of collecting information at check points in process flows and reporting them, and further, detecting and alerting a deviation either in data quality SLAs or process guidelines, should be automated.
3. All other IT processes should be automated:All day to day operational processes that follow a predefined standard operating guideline are prime candidates for automation. Any necessary follow up action should also be either fully automated for self-healing, or in the case where manual intervention is inevitable, should be assisted by automated sub-processes or be backed with enough information to aid a quick RCA and correction. This calls for an operations management framework that collects data from check points and analyses them to detect any deviation in service SLA or quality and take or facilitate corrective actions.
4. Simplified workflows for configuration, reports and analytics:An intuitive and powerful front end is an inevitable component of a self-serve platform. A user should be able to do all configurations, analytics, reporting and further actions from an App or a GUI from a hand held device.The front end should also allow collaboration between Decision scientists, where ideas are exchanged and results from analytics are shared among peers for review and discussions. Different business users require different views and reports, the front end should allow customizable dashboards and support a host of visualizations from which to choose from. It should also feature workbenches that aids Decision Scientist in the design for data ingestion, data preparation, analytics and experimentation, visualization and further actions, all in a configurable workflow.
A good self-service platform should provide built-in data integration workflows for big and fast data acquisition, packaged analytics models offering insights, KPIs and recommendations on the fly, and built-in internal and external monetization workflows that can consume these insights to derive optimal economic value.
Chitharanj Kachappilly is an architect for R&D at Flytxt.