If you are part of an organization or team looking at dashboards and reports made with a decent quantity of aggregated data that has been organized and formatted in a way that any decision-maker can read, then you are sure to have been asked some recurring questions. These include ‘Where is this data coming from?’ or ‘Can I trust this metric or KPI?’ — especially if that report, metric, or KPI hasn’t been done by the individual looking at the report. Some organizations or teams developing data analytics tools, like business intelligence dashboards, usually create a data catalog. Yes, usually; not always.
What Is a Data Catalog?
A data catalog is just an inventory of all data assets. It is designed to help professionals quickly find the most appropriate data for any analytical or business purpose. That inventory can be developed from a simple spreadsheet to a complex and sophisticated relational database. Sometimes, it is also shown as an interactive diagram schema. Fortunately, there are many tools available to keep and maintain data catalogs.
A data catalog uses metadata — data that describes or summarizes other data — to create an informative and searchable inventory of data assets in an organization. These assets can include (but are not limited to):
- Structured (tabular) data
- Unstructured data, including documents, web pages, email, social media content, mobile data, images, audio, and video
- Reports and query results
- Data visualizations and dashboards
- Machine learning models
- Connections between databases
Sometimes, there are so many data assets that it is not possible to create a full inventory. So, in this case, there are smaller inventories by teams, departments, or sections within any organization.
Benefits of a Data Catalog
Unfortunately, it can be common to find that there are no data inventories at all, as the data analyst or business intelligence developer already knows very well what data has been used to develop the report, dashboard, or analytics.
The problem comes when the consumer of such a dashboard or report is not the same person as the developer. This is when the questions indicated at the beginning start to appear.
Building a data catalog takes a little bit of time — not much compared to the time invested to develop an analytical solution — but it brings a great benefit for the organization overall, everyone from the developer and the business analyst to the decision-makers and the data consumer.
Let’s outline some of the many benefits of having and maintaining a data catalog.
Sets Up the Foundation of Data Governance
The simple exercise of reviewing data, understanding how it’s obtained and stored, and determining the overall health of the data helps developers work on setting up basic ‘a priori’ rules. This further helps maintain the quality of the data being consumed by the organization.
As this is happening, the developers can work with the business to improve data entry to systems or even make suggestions on how to improve data quality in the early stages.
Improves Team Management
Data catalogs can improve team management. For instance, data teams can be focused on what type of data is relevant for analysis and what data to use to develop products and applications.
This enables data team managers to better strategize how to scale storage and improve governance on a global scale. It also helps managers to identify the best developers of data products with specific types of data
Increases Trust From Decision Makers
If decision-makers and data consumers are aware of when a data catalog is built, then they don’t have so many questions about trusting metrics or reports.
There is an additional collateral benefit of this: Managers and decision-makers can be more engaged on being trained. Additionally, they can learn more about data governance and understand the importance of data quality.