5 Things to Think Through in 2021 and Beyond in Modern Data Analytics

By January 13, 2021 May 25th, 2021 Blogs
Modern Data Analytics

In 2021, Cloud capability is a given. The prevalent question is how to leverage it to maximize business benefits successfully. In this new decade, enterprises need to refactor applications for true elasticity independently at the storage, compute, and network levels. However, more so now than ever before, enterprises also need to create and adopt the right frameworks for abstraction, future-proofing, and orchestration to utilize the best of the breed. 

Here are the top 5 constructs and approaches to consider in 2021 to take advantage of ML, DataOps, and metadata-driven agility approach in response to changing business demands and ultimately better business outcomes.

DataMLOps: Why put ML in DataOps? Any system truly leveraging ML requires a specific approach to keep it tuned and produce the desired results. It involves training data sets, recalibrating learned weightages, using human-in-the-loop to learn new things, and so on. 

Hence, DataMLOps goes beyond just adding a data scientist to the mix in your DataOps cycle of Build and Operate. It includes a deeper embedding of ML in data management and analytics and capabilities to maintain it effectively.

ML Workbench: ML algorithms have reached a level of maturity that allows them to be containerized and allows non-data scientists and business users to use them in an operational setting. AutoML with top algorithms like Feedforward neural net, TabNet Deep learning or Time Series, XGBoost, or Auto Encoder for Anomaly detection can all be applied to data. The machine can help you choose the right algorithm with the best result for the question at hand. 

Containerization gives the ML applications longevity as the change in the environment (versions of ML software) is the leading factor for the ML solutions to stop working in months of going live. Constant upkeep of the solution would be necessary to keep up with the changes in the environment. 

Consider a construct like ML workbench, which provides these out-of-the-box algorithms. It includes easy ways to version them and data and outputs, and containerization of the overall solution for longevity for quicker and right adoption of ML within the enterprise. 

Co-building the engine: If data is the new oil, why is everyone building their engine, especially when the engine is no longer the differentiator? Data and the last layers of analytics leading to insights are the real differentiators now. 

Data and analytics engine (data pipelines, data models, and analytics workbench) has 50% common components across the industry, 35% specific to the industry, and the last 15% is specific to you. Suppose an engine can be pre-built with the first 85% requirement and only leaving the last 15% to be configured.

In such a case, there is no reason for an organization to build it from scratch, thereby reinventing the wheel. Instead, they would be wise to leverage the lower cost to build and develop a more robust engine with lower ongoing maintenance costs and spend their time on using the engine with their own oil (data), training the ML models from it, and creating rule-based/heuristic models for guided analytics and insights. 

In short, for complex analytics needs where an out-of-the-box solution is not an option, this type of domain aware framework (the engine) would be a good option. This approach gives the framework vendor an incentive and results in lower costs for the sponsoring customer, allowing the vendor to improve their IP further. It enables organizations to codify their core business and quality rules and move away from re-developing these in difficult-to-parse programs.

Intent-driven Governance: A set of data may be acceptable to use for one purpose, but not so for some other purpose. Therefore, by understanding the intent, we can better govern the usage of the data. This approach also leads to a need for “bringing the code to data” rather than providing the data. 

Once data goes outside governance boundaries, it is impossible to track the intent. Instead, if every query/question that is being asked to the data is captured and the intent is derived, it can help control wrong intent at a point in time and also over a period (longitudinal data). This strategy also has an added benefit – understanding the life cycle of new KPIs used by businesses. 

If a company creates a new calculation and modifies it within 1 to 3 weeks, then it can be identified as exploratory. If they are using it as-is for a month or more, it can be categorized as stable. If they stop using this new calculation/KPI, then it can be termed as sunsetting. It gives visibility into the exploration of data and new KPIs being experimented with by business and their life cycle. 

Abstraction and Orchestration: Abstraction allows for quickly moving technologies to be easily upgraded or replaced by making sure that we integrate into functionality and not deep into the technology stack by leveraging the technology through a well-defined interface. A good orchestration would also be required as this approach, along with multi-cloud, is leading to best of the breed technologies, requiring extensive coordination and orchestration among them. 

As we step into 2021, we need to envision and execute a data and analytics strategy that mirrors the underlying environment’s extensibility and elasticity. It is highly adaptable to changes in technology and more sophisticated governance.

Rajeev Dadia

Rajeev Dadia

With 24 years of experience in leadership roles at Saama Analytics and Silicon Graphics (SGI), Rajeev currently leads the Client Care and Delivery organizations, while simultaneously driving the company’s technology roadmap.