Week 4/33: Data Modelling for Data Engineering Interviews (Part #1)
Week 4 of 33-Week Data Engineering Interview Guide
Data modelling is essential in Data Engineering as it defines how data is organised and connected within a system. It's a common focus in interviews, reflecting its importance in creating efficient, scalable, and reliable pipelines and databases. Strong data modelling skills showcase your ability to solve real-world data challenges.
Additionally, there are roles specifically called Data Modelers that intersect with Data Engineering. As a Data Engineer, you may come across these specialised roles, which focus more deeply on designing and optimising data models. These roles highlight the importance of data modelling in the broader data engineering landscape.
For a Data Engineering interview, the question patterns usually follow three common types:
The interviewer presents a scenario and asks you to design an ER diagram, Dimensional Model, or similar, based on the given requirements.
They provide an existing ER diagram and ask you to normalise or denormalise it.
They ask general questions to assess your overall understanding of data modelling concepts and best practices.
In this post, we’ll start with the fundamentals and then work through real examples, helping you become familiar with the different types of data modelling questions.
What we will cover :
What is data modelling?
What is a data model?
Types of data models
OLTP vs. OLAP
ER Diagrams and ER Models
Step-by-step guide to solving Amazon’s data modelling interview question
We will also provide real interview questions from Microsoft, Uber and Meta.
Don’t miss our Data Engineering Interview Preparation series, check out the full posts [here2] and subscribe to stay updated with new posts and tips.
By the end of the series, you'll have a solid understanding of data modelling basics and be confident handling interview questions.
What is Data Modelling?
Definition
Data modelling is the process of creating a visual representation of data and its relationships within a system.
Example:
Think of a library. To make it easy to find books, we organise them by subject—historical books in one section, and novels in another. This structure helps us quickly locate what we need. Similarly, data modelling organises large amounts of data into a clear structure so it can be managed and accessed efficiently.
Purpose:
Data modelling creates a "blueprint" or "map" that shows how data is categorised, stored, and related. It helps ensure the design of a database supports business needs.
Data modelling is more than just creating structures or diagrams; it requires a deep understanding of the business, its processes, and its goals.
When done right, data modelling gives a clear, comprehensive view of the organisation. It helps uncover important insights and relationships that might be overlooked. It also enables data teams to build better systems, products, and solutions that are stable, reliable, adaptable, and aligned with business needs.
What is a Data Model?
, co-author of Fundamental of Data Engineering book, defined data model: A structured representation that organizes and standardizes data to enable and guide human and machine behaviour, inform decision-making, and facilitate actions.
Three Types of Data Models
When it comes to data modelling, there are three key stages, or levels of abstraction, in data model development.
Conceptual Data Model: A non-technical, high-level description of relevant entities and relationships.
Purpose: Focuses on high-level data structure and relationships without technical details.
Key Features: Identifies entities (e.g., Region, Time, Sales) and their relationships.
Use Case: Ideal for initial planning; flexible and simple to create, even on paper.
Abstraction Level: Highly abstract, with no details on database structure.
Logical Data Model: Includes more details, such as attributes we want to track for each entity. Still business-friendly.
Purpose: Adds more detail to the conceptual model, serving as a bridge to the physical model.
Key Features: Includes attributes (e.g., Sales Amount, City, Channel), primary keys, and foreign keys.
Use Case: Provides a detailed blueprint for the database design.
Abstraction Level: Moderately abstract.