Pipeline To Insights

Pipeline To Insights

Share this post

Pipeline To Insights
Pipeline To Insights
Implementing Data Quality Framework with dbt

Implementing Data Quality Framework with dbt

How to Identify, Address, and Implement Data Quality Solution using dbt

Erfan Hesami's avatar
Erfan Hesami
Feb 06, 2025
∙ Paid
27

Share this post

Pipeline To Insights
Pipeline To Insights
Implementing Data Quality Framework with dbt
2
5
Share

Imagine you’ve just joined a company and encountered data quality challenges. Processes need to be reworked, insights shared with stakeholders fall short of expectations, data delays slow down pipelines, and even seemingly valuable source data is too flawed to generate real impact.

What would you do? Where would you start?🤔💭

You’re not alone! Many of us have faced similar challenges throughout our careers. In this post, we’ll share a step-by-step guide to help you take the initiative and address these issues effectively.

Before diving into this post, we highly recommend checking out our previous post to familiarise yourself with the fundamentals. We covered:

  • What is Data Quality?

  • An introduction to data quality dimensions

  • Examples for each dimension, along with insights from our careers

Building Trust in Data: The Fundamentals of Data Quality

Building Trust in Data: The Fundamentals of Data Quality

Pipeline to Insights
·
Jan 21
Read full story

Understanding the definition of data quality, its dimensions, and metrics is one aspect of the journey. However, the real challenge lies in connecting these concepts and applying them to extract meaningful insights. In this post, we aim to bridge that gap by presenting a practical scenario that a data engineer, analytics engineer, or a similar professional might encounter in a company. We will then outline a structured roadmap to effectively address the issue, demonstrating how data quality principles translate into real-world impact.

Pipeline To Insights is a reader-supported publication. To receive new posts and support our work, consider becoming a free or paid subscriber.


Scenario

You’ve just joined a company as a Data Engineer or a similar role, and your first task is to create a data ingestion pipeline for your AI Engineer.

After gathering requirements and exploring the company's SQL Server data, you identify 8 tables to prototype the pipeline. The goal is to enable the AI Engineer to perform analytics and build models. However, after further analysis and discussions with stakeholders, you discover a major roadblock, the source tables lack the necessary data quality to meet business requirements and support the AI Engineer’s needs.

Now, it’s time to take action!

In this post, we’ll walk through a clear, step-by-step approach to building a data quality framework that addresses these challenges and ensures your data is reliable, accurate, and ready for AI-driven insights.

Step 1

The first step is to understand the fundamentals of data quality, including its definition, key dimensions, and common measurement techniques.

You can find this in our previous post here.

Step 2

The second step is to list the data quality issues you encountered while working with the tables intended for the pipeline.

Step 3

The third step is to discuss with downstream stakeholders (in this case AI engineers) to understand what better data quality means to them. It is also good to talk with upstream stakeholders (software engineers, third parties, and so on…) to understand the system that these data generated from. Remember as we mentioned in the previous post, data quality issues might arise since the system was designed for specific operational or transactional purposes.

Step 4

Based on the information and requirements gathered in the previous three steps, document all findings and create a comprehensive list of tables along with their associated data quality issues.

Next, use your understanding of data quality dimensions to categorise each issue according to its corresponding dimension. Aligning issues with their definitions will help quantify data quality and provide insights into the necessary quality checks to implement.

Step 5

Choose a tool to implement your quality checks based on the info you prepared in step 4.


Solution

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2025 Erfan Hesami
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share