Pipeline To Insights

Pipeline To Insights

Share this post

Pipeline To Insights
Pipeline To Insights
Week 25/34: System Design for Data Engineering Interviews

Week 25/34: System Design for Data Engineering Interviews

What system design means for data engineers and how to prepare for interviews

Erfan Hesami's avatar
Erfan Hesami
Jun 30, 2025
∙ Paid
9

Share this post

Pipeline To Insights
Pipeline To Insights
Week 25/34: System Design for Data Engineering Interviews
1
Share

As data engineers, we are responsible for designing end-to-end data products, from sourcing and transforming data to maintaining and orchestrating workflows using specific tools. We make key decisions about storage solutions, data visualisation platforms, and how to architect the system to ensure scalability, maintainability, and data quality compliance.

At a high level, we're building a data system composed of interconnected components, each serving a distinct function. We must also anticipate future growth in data volume, user demand, and business needs, and design the system to scale smoothly without requiring a complete overhaul. In essence, this is a system design for data engineering.

System design is the process of planning how different parts of a system work together to achieve goals like performance, scalability, and maintainability.

In software engineering, this means creating a blueprint for how software components, data, and user interfaces interact to build a system that is reliable and efficient.

In data engineering, we use the same ideas, choosing the right tools, designing scalable databases, picking the right servers to host open-source tools, and building strong ETL/ELT pipelines. The goal is to make sure the data system is efficient, reliable, and ready for future growth.

What both have in common is the goal of designing systems that are:

  • Reliable: they work as expected.

  • Scalable: they can handle more users and data over time.

  • Maintainable: they are easy to update and manage.

  • Efficient: they deliver data fast enough to support business decisions.

When it comes to interviews, especially for data engineering roles, the type of system design questions we get often depends on how junior or senior the role is. One reason interviewers ask system design questions is to assess our level of seniority and how well we understand the bigger picture.

One of the best ways to learn about system design is by studying how different teams and companies design systems that achieve their goals. It helps train our minds to think beyond individual tools and consider the overall architecture. Also, remember, there’s no one-size-fits-all solution; effective design always depends on the system’s specific purpose and constraints.

For learning system design (especially on the software side), we recommend:

  • ByteByteGo Newsletter1 and YouTube Channel.

  • The System Design Newsletter2 by

    Neo Kim
    .

  • Tech Stack series from Junaid Effendi Newsletter.

For data system design specifically, the following books are great:

  • Fundamentals of Data Engineering by Joe Reis & Matt Housley

  • Designing Data-Intensive Applications by Martin Kleppmann

These resources can help us connect the dots, from building blocks to real-world architectural decisions.

In this post, we’ll go through:

  • What does system design mean for data engineers?

  • 9 Principles for Designing Reliable Data Systems.

  • Key concepts data engineers need to understand in data system design.

  • How to Prepare for Data Engineering System Design Interviews.

Everything we’ve covered so far in the data engineering interview series3 forms the foundation for understanding how to design, evaluate, and implement different components in a real-world data system.

You can access the full series here4.

Pipeline To Insights is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber😊🙏.


What does system design mean for data engineers

Let’s start with the basics.

System design is like designing a house

Imagine we're an architect, and someone asks us to design a house.

We don’t just start laying bricks. We first ask questions:

  • Who will live in it? A family? One person? Do they need a backyard?
    (Understand the requirements of the system)

  • How many rooms? Kitchen, bedrooms, bathrooms?
    (Figure out the components you need: storage, pipelines, orchestration and so on)

  • Where will the doors and windows go? How will people move through the house?

    (Define how data will flow through the system: ingestion, transformation, serving)

  • What materials will we use? Brick, wood, concrete?
    (Choose our tools: dlt, dbt, Airflow, Snowflake, etc.)

  • Where do we put plumbing and electricity?
    (Think about monitoring, fault tolerance, and reliability.)

  • How do we make it scalable? What if more people move in later?
    (Design for future data growth, increased traffic, and scale.)

Just like a house must be safe, reliable, and meet the needs of the people who use it, a data system must also be:

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2025 Erfan Hesami
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share