Where is your data?
Modern data and analytics initiatives push strategic requirements for metadata management solutions. These requirements are far from new, but still more topical than ever.
Modern data management requires more than just the existence of data – the data must also be searchable and available whenever it is needed. The EU’s General Data Protection Regulation (GDPR) raised public awareness of the requirement for companies to know what data they have and, more precisely, where that data is. The rapid transition to cloud-based systems only serves to highlight this requirement.
The majority of data analytics undertaken today still require dispersed data sources to be connected together. As a result, the leading companies need to invest in managing data. However, this concern is far from new. Throughout the history of data management, solutions have been developed with several different approaches under different names: data model business metadata, metadata repository, data inventory, data dictionary, data directory, or data catalog, and even data virtualization and API management – these are technologies that strive to solve the same problem of finding data, managing it, and making it available, although they all have slightly different focuses.
Data directories have also undergone major development under open data. In this area, several public administrative bodies are offering up a collection of data. In my opinion, they may even be a few steps ahead of the corporate world in terms of the technologies used for sharing data. A good example of this is Helsinki Region Infoshare, which offers open data from the Finnish capital area at the address, hri.fi.
What functionality can be required of a modern data directory intended for corporate use, and what are modern products capable of? Typical features include:
1. Collecting metadata from diverse data sources. Data directory services include interfaces with varying degrees of automation, enabling the metadata on different systems to be read directly and new metadata to be created by reviewing the data content – in other words, by profiling the data. Profiling can clarify attributes such as value ranges or value groups, dependencies between data sets, and data quality properties.
2. Managing content descriptions. Descriptions of the contents of data are managed and maintained in data directories. Content descriptions are essential for purposes such as providing search functionality.
3. Commenting, evaluation, and approval. Comments can be added to data and data can be discussed. Data content descriptions can be improved using crowd sourcing – the descriptions are built and refined as the outcome of the efforts of different parties. When final changes are made to data, there is often an approval process.
Data modeling generates valuable content for the data directory, whether this exists already or will exist in the future.
4. Traceability of data flows. Whenever information is moved between systems, it is important to be able to determine the flow of information and the transformations that occur along the way. Where the information comes from and how it has been transformed. Traceability is essential in assessing the importance, reliability, and suitability of the information for its intended purpose.
5. Data access layer documentation. Data directories maintain references for how the data can be accessed.
6. Search functions and the ability to find data. Data directories help users to find data by offering user interfaces with features such as groupings, search functions, and menu structures.
7. Displaying data. In addition to content descriptions and metadata, data directories can include functionality enabling users to directly view the actual data or samples of data.
8. Data protection. Whenever data is made easier to find or more accessible, care must be taken to ensure that access rights and data protection are in good order. This will prevent data from falling into the wrong hands. The mere existence of data may be confidential, so it may also be necessary to restrict access to metadata and content descriptions.
Corporate data directories are here to stay. At Enfo, data modeling constitutes a crucial part of every data management development project. Data modeling creates valuable content for the data directory, whether this exists already or will exist in the future.
Based on our experience, well-executed data management requires a holistic approach that can only be achieved through long-term work in collaboration with the customer's businesses. A functional entity is built on a sustainable information architecture where every architectural component performs a defined function as part of the whole. The role of the data directory in this entity provides good support for strategic knowledge management by directly serving aspects such as data integration and the management of master data and information ownership.
Mika Naatula works as a CTO in Information Management at Enfo