Security Fundamentals for Data Engineers
The Role of Security in the Data Engineering Lifecycle
In earlier posts, we discussed the Data Engineering Lifecycle as outlined by
and Matt Housley in Fundamentals of Data Engineering book. If you're interested in learning more about the lifecycle, please check our previous post here:In this series, we’ll focus on the undercurrents, the core principles that underpin successful data engineering projects and provide a strong foundation for every stage of the lifecycle:
The Importance of Security in Data Engineering
Moving and storing data can introduce security risks, particularly when handling sensitive information such as personal details and business data.
As a Data Engineer, it is your responsibility to protect this data and ensure its safety during its lifecycle. This includes strong security measures like encryption, access controls, and monitoring.
Using the right security practices and tools is essential in building secure systems.
Security isn’t optional, it’s a core responsibility. Ignoring it can lead to serious consequences, which is why it must always be a top priority for data engineers.
The Human Factor in Security
Security is not just about systems and protocols, it’s about people. It begins and ends with individuals like you and everyone in your organisation. The greatest vulnerability in security often lies in human error or oversight.
Adopting a defensive mindset is key to staying secure. Be cautious with credentials and sensitive data, think carefully before sharing confidential information, and always assume you could be a target, as threats are constant.
Staying vigilant in both online and offline actions and making security a habit helps protect against risks and safeguard valuable information.
Foundational Principles and Best Practices for Securing Data
1. The principle of Least privilege
Only grant access to the essential data and resources needed for the task.
Access should be granted only for the necessary duration.
Avoid giving unnecessary admin or superuser permissions.
Don’t use root or administrator privileges unless absolutely necessary.
2. Data Sensitivity
The most effective way to protect sensitive data is to avoid ingesting it into your system unless absolutely necessary. Storing data wihout a clear purpose increases the risk of accidental leaks, so it’s best to eliminate unnecessary sensitive information altogether.
3. Security on Cloud
In today's cloud-centric world, security involves new dimensions that require a solid understanding of:
Identity and Access Management (IAM): IAM manages who can access what within an organisation. It ensures that only authorised users have the right permissions to view or use specific data and systems.
Encryption Methods: Encryption transforms data into a secure format so that only authorised users can read it. It protects data both when it’s stored and when it’s being sent.
Networking Protocols: Networking protocols are rules for how data is sent and received over a network. They ensure that devices can communicate effectively and securely.
4. Security theater
"Security theatre" refers to when an organisation looks like it’s prioritising security by following checklists or meeting compliance standards but doesn’t truly commit to making security part of its culture. It’s more about appearances than real action.
This happens when the focus is on ticking boxes instead of encouraging everyone in the organisation to take security seriously. Real security comes from everyone working together to protect data and making security a daily habit.
Security theatre isn’t just superficial, it’s risky. By focusing on looking secure instead of being secure, organisations leave themselves open to serious vulnerabilities.
Conclusion
Security should be a habit in both thought and action, treat data like you would your wallet or smartphone. While you may not manage your company’s security, understanding basic practices and staying security-conscious helps minimise the risk of data breaches.
Use principles, protocols, tools, and technology to secure data.
A strong security culture comes from shared responsibility across the organisation.
Everyone should understand their role in protecting data.
Building a security mindset is just as important as using the right tools.
By following these practices, you can significantly reduce security risks.
If you're interested in diving deeper into this topic, consider exploring the book Fundamentals of Data Engineering by industry experts
and Matt Housley. Additionally, Joe recently released a Data Engineering course on DeepLearning.AI, which you might find incredibly valuable.You might also enjoy previous posts about the Data Engineering Lifecycle:
Join the Conversation!
What are your thoughts on the role of security in data engineering? Have you implemented any of these principles in your work? We'd love to hear your experiences and insights. 🙂