What is it?
Any company that adopts Data Engineering should also have a data governance strategy, which include the people, processes, and technologies needed to guarantee that the data is comprehensible, complete, accurate, secure and detectable. In other words, it focuses on quality, security, and availability.
The topics of Data Governance
According to Rosangela Marquesone, in her book about Big Data, the main topics of a data governance strategy are:
-
Data Architecture
The chosen architecture defines how data will be stored across the organization. The underlying infrastructure should be updated, maintained, and suffice the organization needs.
-
Audit
A effective data governance strategy should allow for tracking when the data is created, modified and how it’s being used, being possible to know where the data is impacting the organization, and which teams are using.
-
Metadata Management
Metadata will be used by the governance team to contextualize and standardize all data. It can be both technical information and business-oriented information.
-
Master Data Management — MDM
In Big Data, unstructured data is captured and stored in its original format, which can make it difficult for analysts and developers to use it. The MDM initiative tries to create structured databases from the original data, making it easier and more accessible to users.
-
Data Modelling
The various types of data of a organization need various types of databases and data modelling strategies. These need to share a common ground, with standard attributes.
-
Data Quality
While the MDM data should always be the most accurate possible, quality strategies should be applied to all data in a organization. Profiling, cleaning, filtering, and grouping should be a culture across the organization, not only the team behind MDM.
-
Security
Data should be secured physically, technically, and administratively. Risk management related to extraction, storage, access, processing and analysis of data should be applied to all data.