Why mixing DevOps and BI might be a bad idea
Why mixing DevOps and BI might be a bad idea
Sure, having a customer or project focused team quickly build, deploy and iterate new functionality at high speed is indeed appealing. Should anything go wrong, just re-iterate again! Or to put it in the words of a customer of mine: "the team should eat their own sh*t!". Very much to the point.
Agile methodologies and mindsets are cornerstones of any modern BI initiative. This is nothing new (if you think it is, then I suggest a chat with your supplier….), but analytics has some traits that one need to be very aware of when hitting the DevOps-road. At Enfo, we know the importance of a trusted analytics platform and how to achieve it with DevOps.
First, an analytics solution is not a fixed target. For example, a developed integration's main purpose is to provide a predictable functionality of data exchange over time. An analytics solution on the other hand is built to change as the business does, its very nature is dynamic and ever changing.
Secondly, analytics is rarely about functionality. This might come as a surprise that new dashboards, graphs and what not is second grade food, but there you have it. People tend to stick to their 3-5 favorite graphs, and that's that. There is a reason Excel is so prevalent...
Thirdly, analytics insights, either as AI built into your website or a KPI in a nice dashboard, is only as good or useful as the data you put in it. Or more accurately, the trust that the consumers of your analytics insight put in your data. Tricky, isn’t it?
Trust often has a really, REALLY, big slew of data quality in it. However, data quality is in turn a moving target. Data quality is always related to the purpose of the use of the data. Financial statements? You better make sure that the sixth decimal is freaking accurate! Market trend analysis? Data Quality Schmata Quality… who cares…. give me ballpark figures already!
And here is the thing. The part of DevOps that aim at automating test and deployment, is essentially removing the human factor at the benefit of speed and time-to-market. This works excellent in most cases! If anything is not 100% accurate, just re-iterate and improve it. Still not good enough? Do it again! Functionality is fixed, eg easy to pinpoint, and hence quite easy to automate tests for.
But Trust. Trust is earned. Trust is earned due to accuracy and precision over months and years and is just as easily lost over days and weeks. Having a DevOps process in an Analytics environment could in a worst case-scenario break the trust of your entire Analytics platform, resulting in major upheaval and repel- and replace projects. Having two flawed releases in a row will have your analytic insights consumers frown and not really be sure what to feel about the third release... And even if you have redeemed yourself with four of five subsequent succesful releases, the next flawed release will without doubt bring the earlier two to mind. Why? Because that's how the human mind operates. Your DevOps just became a DevOooops.
For this reason a new accronym has entered our sphere of concern - DataOps! This in some part a re-branding of the old term Data Management but is has some merits on its own. What DataOps does is to automate the data analytics pipeline, using statistical process control for constant monitoring and control of the result. Sounds nifty, doesn't it?! Boring down to the details, this currently boils down to this in oil industry terms;
We built a pipeline from Alaska to New York; it transfers 1 Gazillion barrels of oil every day. Our automated monitoring show that we have a 1-1 ratio of what enters the pipeline in Alaska and what is delivered out of it in New York. Success!
Sounds great! What could possibly go wrong? Well, what if the customer says "I didn't expect oil, I wanted Coca Cola..." (this is not at all farfetched based on my own experience...)
When we talk about analytics, we also talk about data quality in relation to intended use and expectations of the data. Since this is human factors it becomes extremely hard to automate. At this point, you can not simply factor out the humans involved in the process, when pushing for a fully automated DataOps-process.
DataOps began with the data lifecycle issues that sprung from Big Data-projects, there was a need to quantify the accuracy of the data movements in the pipeline. Since the use of data in a Data Lake is largely unknown, any purpose related data quality tests of the actual content pretty much become guesswork. Evaluating the pipeline was thus good enough.
In an Analytics environment, pipeline status is far from good enough. But we still want speed and time-to-market goodies that DevOps gives. To address this issue, automated data quality tests needs to be built into the analytics platform in collaboration with the data consumers. This will establish common understanding of what level of automation is possible and needed as well as to what degree of data quality is accepted.
To earn the trust from your data consumers, this is the way of working with DataOps.
Have any thoughts or questions? Contact me!
Magnus Hagdahl, Principal Consultant Enfo