Pipeline To Insights

Pipeline To Insights

Share this post

Pipeline To Insights
Pipeline To Insights
Week 23/34: Data Contracts for Data Engineerging Interviews

Week 23/34: Data Contracts for Data Engineerging Interviews

From theory to practice, how data contracts transform trust, ownership, and observability across the data engineering lifecycle

Erfan Hesami's avatar
Erfan Hesami
Jun 02, 2025
∙ Paid
12

Share this post

Pipeline To Insights
Pipeline To Insights
Week 23/34: Data Contracts for Data Engineerging Interviews
1
Share

In our previous posts, we explored the fundamentals of data quality, including validation, observability, and cataloguing. We covered topics such as data quality dimensions, common data quality issues, how to build a data quality framework, and the roles of metadata and data catalogues.

If you missed the series, we recommend starting [here]1 to build a strong foundation.

Note: One critical element ties everything together: communication. It’s the invisible glue, the umbrella under which all other components align, enabling them to work cohesively to address data quality challenges..

In this post, we’ll walk you through data contracts, an important component that can help improve and maintain data quality.

Data contracts are agreements that clearly say what data should look like and how it should be shared between teams or systems. They help make sure everyone understands the data's format, structure, and meaning, so there are no surprises when the data is used.

Before we dive in, it’s worth acknowledging two pioneers who have helped shape the data contract space: Andrew Jones2 and Chad Sanderson3.

We’ve often wondered about the differences in their approaches, and after some research, we've compared their approaches and summarised the key differences in this post. But if you’re eager to go deeper, please check their books here:

  • Driving Data Quality with Data Contracts: A practical guide to building a reliable, trusted, and effective data platform4 by Andrew Jones.

  • Data Contracts: Developing Production-Grade Pipelines at Scale5 (Publishing on 30 November 2025) by Chad Sanderson and Mark Freeman.

Whether you're already using data contracts or just hearing about them, having practical experience with this concept can set you apart, especially in interviews or when working on high-stakes data projects.

In this post, we'll explore:

  • What are Data Contracts?

  • Why Data Contracts Matter?

  • The core components of a Data Contract.

  • Problems Data Contracts solve.

  • Where to apply Data Contracts across the Data Engineering lifecycle.

  • How to get started with Data Contracts in your organisation?

  • Tools and techniques for implementing Data Contracts.

  • Andrew vs. Chad: A comparison of two thought leaders' approaches

  • Interview questions and a practical implementation using dlt (Data Load Tool).

We’re almost done with 70% of the Data Engineering Interview Preparation series, our most proud series yet! If you don’t want to miss any posts, check them out here6 and don’t forget to subscribe to stay updated.

Here is the full plan:

Pipeline To Insights is a reader-supported publication. To receive new posts and support our work, consider becoming a free or paid subscriber🙏🙂.


What Are Data Contracts?

Let’s say you’re running a restaurant:

  • The kitchen(data producers) prepares meals(data).

  • The waitstaff(data pipelines) delivers meals to customers.

  • The customers (data consumers: Analysts, ML Engineers, and so on) expect their orders to be correct and timely.

Now imagine if the kitchen suddenly changes how dishes are plated, swapping ingredients, altering presentation, without informing the waitstaff. What happens? Chaos. Wrong orders, disappointed customers, and lost revenue.

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2025 Erfan Hesami
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share