Week 26/34: Data Governance for Data Engineers
What Data Engineers need to know about data governance and where to apply it
As a data engineer, we might join a company where data governance is already in place: everything is well-documented, there’s a data catalogue, every data source has a clear owner, and metadata is easy to find. We can trace the full lineage of a dataset, from source to pipeline to dashboard/AI models, and quality checks are embedded throughout the process. Access to data is governed by clear roles and policies, with structured procedures for requesting access or making changes.
That’s the ideal scenario. But in reality, it’s rarely like this.
In most cases, especially in less mature data organisations, many of these governance practices might not exist at all. Documentation is limited, ownership is unclear, and metadata is incomplete or outdated. And while there may be formal data governance roles in theory, not every company has the resources or maturity to establish them in practice.
This is where we, as data engineers, step in.
Even if it’s not in our job title, data engineers are on the frontlines of how data is collected, transformed, and served, which means we share responsibility for ensuring it is reliable, secure, and trustworthy. Governance doesn’t just come from the top; it starts with us. Every data engineer should understand the core principles of data governance and actively apply them in their day-to-day work. That might mean flagging inconsistencies, advocating for documentation, suggesting improvements, or implementing practices that support transparency and accountability.
In Data engineering interviews, it’s valuable to show that we understand data governance, what it is, how our work aligns with it as a data engineer, and how we can help strengthen it within the organisation.
In this post, we cover
What is data governance?
4 Key Principles of Data Governance
7 Key concepts in data governance every Data Engineer should know
Interview questions
Missed the previous posts of this interview preparation series? Catch up here: [Data Engineering Interview Preparation Series]1.
What is Data Governance?
Data governance is a core data management function focused on ensuring the quality, integrity, security, and usability of an organisation’s data throughout its entire lifecycle, from the moment data is created or collected, to when it is archived or deleted. (As defined in Data Governance: The Definitive Guide: People, Processes, and Tools to Operationalise Data Trustworthiness2 by Evren Eryurek, Uri Gilad, Valliappa Lakshmanan,Anita Kibunguchy-Grant and Jessi Ashdown)
Effective data governance ensures that data is:
Accessible to the right people at the right time.
Usable in a way that supports business outcomes like analysis, insights, and decision-making.
Compliant with industry, government, and company-specific regulations.
This includes making sure that data:
Is accurate, up to date, and consistent across systems.
Can be trusted for reporting, analysis, and operational use.
can be accessed or modified by authorised users.
Changes and access are logged and traceable.
Ultimately, the goal of data governance is to build trust in data. Trustworthy data allows teams to make better decisions, assess risk, and track business performance using reliable metrics like KPIs.
Note: Data governance isn’t just for highly regulated industries like banking or healthcare; it matters for any organisation that wants to make confident, informed decisions with data. While setting it up can feel overwhelming, starting small and improving gradually is often the most sustainable path.
Keen to learn more about Data Governance, we suggest checking the resource below by
.Also, the
newsletter by is an excellent source for learning about data governance in greater detail and how to apply it in practice.4 Key Principles of Data Governance
Effective data governance is built on four core principles: Transparency and Discoverability, Accountability, Standardisation, and Security. While these principles are universal, how they're implemented can vary greatly depending on the size, maturity, and data culture of an organisation.
For data engineers, understanding these principles isn't just about compliance; it's about building trustworthy systems. We’re not only designing pipelines and storage systems; we’re shaping how data is accessed, protected, and understood. Whether we're choosing tools, defining schemas, or implementing access controls, our decisions directly impact how well data is governed.
1. Transparency and discoverability
Transparency in data governance means that everyone, inside and outside the organisation, should understand how data is governed and why. Clear, open communication about data policies builds trust, reduces confusion, and helps gain support from both technical and non-technical stakeholders.
For example, an organisation may enforce a policy that prevents sensitive data from being displayed on its website. Transparency means not only enforcing that rule, but clearly explaining its purpose (e.g., to protect user privacy) and how compliance is monitored.
Closely linked to transparency is data discoverability, the ability for users to easily find, understand, and access the data they need. This requires:
Access to technical metadata and data lineage.
A well-maintained business glossary.
Accurate, complete, and consistently structured datasets.
Together, transparency and discoverability ensure that data isn’t just well-governed, but also usable and trusted.
2. Accountability
Accountability in data governance means that everyone who interacts with data understands their responsibilities and is held to them. Clear, agreed-upon roles help ensure that data is handled correctly, consistently, and ethically.
For example, if an employee is required to report personal stock holdings by a certain date, failure to do so could have serious consequences, such as disciplinary action or even job loss. The same principle applies to data: accountability ensures that expectations are clear and that there are consequences for not meeting them.
When organisations treat data as a product, accountability becomes even more critical. Just like with any product, there must be defined owners who are responsible for the quality, reliability, usage, and lifecycle of the data.
3. Standardisation
Standardisation in data governance ensures that data is consistently labelled, described, and categorised across the organisation. This consistency is essential for improving data quality, enabling collaboration, and making data easier to use across systems and teams5.
For example, imagine an enterprise managing product inventory data across multiple systems: the ERP system records product categories as numerical codes (e.g.
1001
,1002),
the e-commerce platform uses descriptive labels like"Electronics"
or"Apparel"
, and the analytics platform stores them under custom tags such as"ELEC"
or"APP"
. Without standardisation, joining or analysing product data across these systems becomes error-prone and labour-intensive
By enforcing naming conventions, consistent data types, and aligned taxonomies, standardisation:
Improves searchability and discoverability.
Reduces duplication and confusion.
Makes data integration across tools and platforms more seamless.
Enhances communication between teams.
Ultimately, standardised data is more reliable, more usable, and more valuable because it behaves predictably and supports better decisions.
4. Security
Security in data governance means protecting data from breaches, misuse, and loss. It includes managing access controls, ensuring auditability, complying with regulations (like GDPR), and handling sensitive data such as PII responsibly.
In the Fundamentals of Data Engineering book6 by and Matt Housley, security is described as one of the key undercurrents of the data engineering lifecycle, a principle that must be considered at every stage, from ingestion to serving.
In this post7, we explored the foundational security principles every data engineer should understand, focusing on:
The Importance of Security in Data Engineering
The human factor in security
Foundational Principles and Best Practices for Securing Data