Sitemap - 2025 - Pipeline To Insights

Getting Started with OpenMetadata: An Open-Source Data Catalogue Solution

A Data Engineer’s Guide to Vector Databases (Part 2): Full-Text, Semantic, Hybrid Search and Reranking with pgvector

A Data Engineer’s Guide to Vector Databases (Part 1): Core Concepts Before Building AI-Powered Applications

Storage Fundamentals for Data Engineers

Pandas vs. Polars vs. DuckDB vs. PySpark: Benchmarking Libraries with Real Experiments

How to Gather Requirements Effectively as a Data Engineer

How to Succeed in Data Engineering Interviews

Week 28/31: Behavioural Interview Questions for Data Engineers

Infrastructure as Code for Data Engineers

Week 27/31: Data Visualisation for Data Engineers

Survival Tips for Data Engineers in the Age of Generative AI

Week 26/31: Machine Learning for Data Engineers

Centralised Orchestration in Dagster Using Code Locations

Week 26/31: Data Governance for Data Engineers

Week 25/34: System Design for Data Engineering Interviews

Docker for Data Engineers

Week 24/31: DevOps and DataOps Practices for Data Engineering Interviews

7 key factors every Data Engineer Should Consider When Choosing Tools

Week 23/31: Data Contracts for Data Engineering Interviews

Stop Being the Invisible Data Engineer: 8 Strategies for Career Success

Week 23/31: Real-Time Processing for Data Engineering Interviews

How to Choose Between Batch and Stream Processing?

Semantic Models: Data Modelling for the Modern Data Stack

Week 22/34: Batch Processing for Data Engineering Interviews

Proactive Mindset for Data Engineers

Week 21/34: Open Table Formats for Data Engineering Interviews

Week 20/34: Data Storage Paradigms for Data Engineering Interviews

dbt in Action #4: Snapshots and Slowly Changing Dimensions

Week 19/31: Cloud Computing for Data Engineering Interviews

What is Data Observability and How Does It Support Data Quality

Week 18/34: Data Pipelines and Workflow Orchestration for Data Engineering Interviews (Part #3)

Pipeline Design and Implementation for Small-Scale Data Pipelines

Week 17/31: Data Pipelines and Workflow Orchestration for Data Engineering Interviews (Part #2)

dbt in Action #3: Analyses, Materialisations and Incremental Models

What is Data Architecture and why Data Engineers should consider it

Week 16/34: Data Pipelines and Workflow Orchestration for Data Engineering Interviews (Part #1)

Common Data Engineering mistakes and how to avoid them

Week 15/34: Data Transformation with dbt for Data Engineering Interviews

Pandas vs. Polars: Benchmarking Dataframe Libraries with Real Experiments

Week 14/34: Data Engineering with Databricks for Data Engineering Interviews

Data Compression in SQL

Week 13/34: Spark Fundamentals for Data Engineers

dbt in Action #2: Seeds, Tests and Macros

Week 12/34: Data Warehousing with Snowflake for Data Engineering Interviews

Metadata: What it is and why do we need it?

Week 11/34: ETL and ELT Processes for Data Engineering Interviews #2

How to Transition from Data Analytics to Data Engineering

Week 10/33: ETL and ELT Processes for Data Engineering Interviews #1

Data Serialisation: Choosing the Best Format for Performance and Efficiency

dbt in Action #1: Fundamentals

Why Data Quality Is the Key to AI Success

Week 9/33: Data Structures and Algorithms for Data Engineering Interviews

Implementing Data Quality Framework with dbt

Week 8/31: Programming for Data Engineering Interviews

Why I Moved from Data Science to Data Engineering

Zero-ETL: What It Is and What It Isn't

Week 7/31: NoSQL and Vector Databases for Data Engineering Interviews

Building Trust in Data: The Fundamentals of Data Quality

Week 6/31: Data Modelling for Data Engineering Interviews (Part #3)

Week #9: 100 Days of SQL Optimisation

Week 5/31: Data Modelling for Data Engineering Interviews (Part #2)

Data Modelling Fundamentals: Normalisation, 3NF and Dimensional Modelling

Week #8: 100 Days of SQL Optimisation

Week 4/31: Data Modelling for Data Engineering Interviews (Part #1)

11 Storage Formats for Data Engineers