Week 26/31: Machine Learning for Data Engineers
How Data and ML Engineers can work together to build robust ML Systems.
I'd like to open this post with a quote from Designing Machine Learning Systems: An Iterative Process for Production-Ready Applications1 by
A vast majority of ML models today learn from data, so developing ML models starts with engineering data (Chapter 2).
The final stage of the data engineering lifecycle, as outlined in the Fundamentals of Data Engineering book2 by Joe Reis and Matt Housley, involves serving data for downstream use cases.
This goes beyond simply making data accessible; it’s about empowering stakeholders to extract meaningful business value, drive informed decisions, and deliver impactful outcomes. Among these stakeholders are machine learning engineers, who rely on high-quality, well-prepared data. Data engineers play a critical role by collaborating with data scientists and ML engineers to acquire, transform, and deliver the datasets needed for model training and evaluation. (Interested in learning more about the Data Engineering Lifecycle? Check out this series3.)
Just like data engineers have their own lifecycle, machine learning engineers follow a different but similar one. Both workflows share similarities in some ways, but ML engineering focuses on tasks such as training models, storing features, tracking model performance, and updating them as needed. It's a separate process, but it works closely with data engineering.
Data engineers design and build scalable systems that make the data ecosystem more robust, reliable, and efficient. Their work focuses on creating infrastructure that saves time, prevents future issues, and supports long-term growth. One key way data engineers add value, especially when collaborating with ML engineers, is by preparing high-quality data, automating parts of the ML workflow, and streamlining operations to ensure models run smoothly and reliably in production.
When data and ML engineers collaborate, they can achieve better results, especially in the areas where their work overlaps. Depending on the organisation, roles may differ: in some cases, ML engineers handle the entire ML data lifecycle themselves; in others, data engineers take on responsibilities like data preparation and even feature engineering to support ML efforts.
When it comes to data engineering interviews, it’s helpful to show that we understand the basics of machine learning and how our work can support ML teams. We might also see job descriptions where ML knowledge is desirable. Being able to speak confidently about cross-functional collaboration and shared responsibilities can help us stand out from other candidates.
In this post, we cover:
What Data Engineers need to know about Machine Learning (ML)?
How do Data Engineers support every stage of the ML lifecycle?
The key differences between ML and data engineering pipelines.
Core principles for building reliable ML pipelines.
How data and ML engineers can work better together?
Don’t miss our Data Engineering Interview Preparation series, check out the full posts [here2] and subscribe to stay updated with new posts and tips.
Here is the full plan: