Where is your data?
In a recent report1), Gartner predicted that the majority of data analytics undertaken in 2020 will still require dispersed data sources to be connected together. As a result, the leading companies will be required to double their investments in managing metadata by 2020. However, this concern is far from new.
Modern data management requires more than just the existence of data – the data must also be searchable and available whenever it is needed. The EU’s General Data Protection Regulation (GDPR) raised public awareness of the requirement for companies to know what data they have and, more precisely, where that data is. The rapid transition to cloud-based systems only serves to highlight this requirement.
In a recent report1), Gartner predicted that the majority of data analytics undertaken in 2020 will still require dispersed data sources to be connected together. As a result, the leading companies will need to double their investments in managing data by 2020. However, this concern is far from new.
Throughout the history of data management, solutions have been developed with several different approaches under different names: data model business metadata, metadata repository, data inventory, data dictionary, data directory, or data catalog, and even data virtualization and API management – these are technologies that strive to solve the same problem of finding data, managing it, and making it available, although they have slightly different focuses. I am using English terms here because there is no established Finnish terminology for this.
Data directories have also undergone major development under open data. In this area, several public administrative bodies are offering up a collection of data. In my opinion, they may even be a few steps ahead of the corporate world in terms of the technologies used for sharing data. A good example of this is Helsinki Region Infoshare, which offers open data from the Finnish capital at the address, hri.fi.
What functionality can be required of a modern data directory intended for corporate use, and what are modern products capable of? Typical features include:
1. Collecting metadata from different data sources. Data directory services include interfaces with varying degrees of automation, enabling the metadata on different systems to be read directly and new metadata to be created by reviewing the data content – in other words, by profiling the data. Profiling can clarify attributes such as value areas or value groups, dependencies between data sets, and the qualitative properties of data.
2. Managing content descriptions. Descriptions of the contents of data are managed and maintained in data directories. Content descriptions are essential for purposes such as providing search functionality.
3. Commenting, evaluation, and approval. Comments can be added to data and data can be discussed. Data content descriptions can be improved using crowd sourcing – the descriptions are built and refined as the outcome of the efforts of different parties. When final changes are made to data, there is often an approval process.
Data modeling generates valuable content for the data directory, whether this exists already or will exist in the future.
4. Traceability of data flows. Whenever information is moved between systems, it is important to be able to determine the flow of information and the transformations that occur along the way. Where the information comes from and what has been done to it. Traceability is essential in assessing the importance, reliability, and suitability of the information for its intended purpose.
5. Descriptions of access interfaces. Data directories maintain references for how the data can be accessed.
6. Search functions and the ability to find data. Data directories help users to find data by offering user interfaces with features such as a range of groupings, search functions, and menu structures.
7. Displaying data. In addition to content descriptions and metadata, data directories can include functionality enabling users to directly view data or samples.
8. Data protection. Whenever data is made easier to find or more accessible, care must be taken to ensure that access rights and data protection are in good order. This will prevent data from falling into the wrong hands. The mere existence of data may be confidential, so it may also be necessary to restrict access to metadata and content descriptions.
Corporate data directories are here to stay. At Enfo, data modeling constitutes a key part of every data management development project. Data modeling generates valuable content for the data directory, whether this exists already or will exist in the future.
Based on our experience, well-executed data management requires a holistic approach that can only be achieved through long-term work in collaboration with the customer's businesses. A functional entity is built on a sustainable information architecture where every architectural component performs a defined function as part of the whole. The role of the data directory in this entity provides good support for strategic knowledge management by directly serving aspects such as data integration and the management of master data and information ownership.
1) Gartner Magic Quadrant for Metadata Management Solutions, published August 9, 2018.
Mika Naatula works as a Senior Vice President Business Solutions, Information Management at Enfo
If you are interested in data and information management, come along to a joint breakfast event entitled “Where is your data?”, which will be held by Enfo and Informatica in Helsinki on November 20, 2018. Read more about the event: Where is your data? (in Finnish)