Data quality might be the egg that precedes the chicken
When googling "top 10 reasons your EDW-project fail", you will find the usual suspects of lack of communication, bad management, too much or too little management, unclear goals and not trusting the result. Indeed, even a "Top 5 reasons"- or "Top 3…"-googling would include "not trusting the result". With some bias, even a "Top 1" would give you the same result. That is quite interesting!
This means that no matter how much effort is spent in mitigating risk, improving control, funding and communication, at the end of the day the Analytics project might still fail because "nah, well, I don’t trust it". That is pretty much the same as spending a full day cooking a 5-course meal to your family and having your kids ask for ketchup for their Risotto Milanese or say "I don't want rice. I want pasta". Thanks…
With this knowledge many EDW projects tend to make things even worse; having the approach of not using any data that is not 100% correct. Well, show me a corporation or a business process without humans in it, then you have a chance to hit your 100%.... Not only will the project never ever finish and be massively over budget, it will likely also drain the last patience from the planned-to-be-end-users.
So, stuck between a choice of having a project fail or never to finish, what to do?
The initial question is not about the project goals or management of it at all. It is about how to relate to that humans make errors and how that impact business processes and decisions. Humans are often wicked smart, inventive and flexible, with that fantastic upside comes also human data entry errors.
I had a bank customer call me once, being very upset that our BI system showed that more than 700 MSEK had been loaned to a customer.
"It is impossible, we only deal will small loans without collateral"
"Well, that might be true… but your credit risk assessor put the customers social sec number in the loan amount field…."
From my experience, no matter how rigid system of data quality control is, it will - unintentionally - be outsmarted by the average person using the system.
Rigid control systems also tend to drive cost and complexity, not a good thing either. But we must do something, right?
Yes, we must. Change your perspective. Accept that humans will cause data quality issues and will continue to do so as long as the task is not automated. While accepting that fact of life, implement a continuous data quality monitoring and correction process. Instead of panicking over that "data quality is not perfect NOW", take a deep breath and be assured that "data quality will be even better tomorrow!".
Yay! Positive vibes! Everyone loves that, don't they?!
Not only a can-do-attitude will come of this, your Analytics project will actually be THE catalyst to push data quality to new highs.
When integrating several sources that all have some degree of data quality issues, the total pile of issues will increase exponentially due to the integrated-one-version-of-the-truth-aspect of the EDW-project. Small issues in a single system, ruled too minor to correct, could amount to total show-stoppers in a totally integrated scenario. Since data quality is critical to the success of an Analytics project, attention to detail is second to none, resulting in finding most issues in all systems, resulting in a very, very long to-do-list. If this massive amount of issues were to be handled as a one-off-correction, it takes very little imagination to find yourself looking up a daunting mountain of problems.
Since we changed perspective, we just feedback the issues we find to the data quality process! In this effort, the Analytics project will server another purpose; providing motivation. The Analytics project will not only identify every tiny problem, it will also provide context and impact on business processes of those issues; also helping people understand WHY it is important to handle things differently.
Instead of promising that the project will delivery 100% accurate data, educate all members and participants that data quality will improve over time and we are improving it every day. This is something that people understand and usually accept, at least after having a few iterations of the data quality process and having the most critical issues dealt with. As we learn more about the flaws in our systems and processes, finding errors and correcting them will be sped up and - when possible - automated.
Feel good about tomorrow, things will be even better!
Do you have any questions about how to manage your EDW/Analytics project or setting up a continous data quality process, please feel free to contact me!
Magnus Hagdahl works as a principal consultant at Enfo Analytics