Pipeline To Insights

Pipeline To Insights

Home
Notes
Chat
Data Engineering Lifecycle
SQL Optimisation
DE Interview Preparation Guide
Data Quality and Governace
Data Career Transition
data build tool (dbt) series
DevOps for Data Engineers
Archive
Leaderboard
About

The Data Engineer’s GitHub Portfolio (2026 Edition)

If you want the interview, your GitHub must prove you can design systems, handle infrastructure, and make engineering trade-offs

Pipeline to Insights's avatar
Yordan Ivanov's avatar
Pipeline to Insights and Yordan Ivanov
Feb 24, 2026
Cross-posted by Pipeline To Insights
"Many of you reach out asking about how to build a portfolio that would maksimize your chance to get hired. This article I wrote for Pipeline To Insights answers 99% of the questions."
- Yordan Ivanov

I recently read “The Certifications Scam” by Yordan Ivanov . It clearly explains when certifications help and when real, hands-on work matters more.

That made me think about a common question: what does a good GitHub portfolio actually look like? What kind of projects show real skills? How do you go beyond tutorials? And how should you explain what you’ve built?

Yordan answered these questions in a follow-up post, and I found it very useful. Hope you enjoy it too.

About Yordan:

He is a data engineering leader and writer behind Data Gibberish. His writing focuses on helping data engineers make their work visible, connect technical results to business impact, and get proper recognition for what they build. He covers real-world topics like career growth, leadership without a title, decision-making, and turning day-to-day engineering work into outcomes that matter.

Subscribe to Data Gibberish

Getting ready for data engineering interviews in 2026? Read another great post by Yordan here:

How to Succeed in Data Engineering Interviews

How to Succeed in Data Engineering Interviews

Yordan Ivanov and Erfan Hesami
·
August 31, 2025
Read full story

Pipeline To Insights is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber🙂🙏.


The job market has stopped rewarding attendance. We have shifted from a supply-constrained industry to a quality-constrained one, where a CV claiming you finished a bootcamp is no longer a golden ticket.

As a hiring manager, if I see an interesting CV, I immediately check GitHub because it is the only place where I can verify your technical taste.

Your CV is the pitch, but your GitHub is the due diligence. I use it as a “proof of work” filter to see whether you can actually build or know the keywords. Most senior engineers over-invest in their resume while letting their public presence signal a lack of organisation or curiosity.

In a market where everyone looks the same on paper, a well-organised GitHub provides a massive benefit. It shows you are responsible for outcomes, not just tasks.

In this article, I will tell you how to move your GitHub from code storage to a curated gallery that proves you can design systems and handle the “engineering“ part of data engineering.


What Most Portfolios Get Wrong

Most GitHub profiles are digital junk drawers. When I open a profile and see 50 repositories named test-1, learning-spark, or my-first-pipeline I don’t see a “builder“. I see a lack of organisation. You are burying your actual skills under a mountain of noise, making it impossible for a busy manager to find a signal in the 60 seconds they give your profile.

Another category of problematic profiles is those with long gaps of inactivity followed by a single “final project“ that looks exactly like a course template. This signals that you only code when someone gives you a syllabus. In 2026, I am looking for engineers who solve problems because they can’t help themselves, not because they are chasing a certificate.

I know, you might argue that your best work is in private corporate repos. But having zero public presence is a massive risk. You need a “technical window“ that shows how you think, how you structure logic, and how you handle trade-offs. Without it, you are asking me to hire you based on faith, and in a quality-constrained market, faith doesn’t scale.

These mistakes harm you because they suggest you can follow instructions but cannot solve unique problems. If you cannot curate your own work, I have no reason to believe you can manage the complexity of a production data ecosystem.

So, here’s how to structure your GitHub portfolio instead.


How To Structure Your Homepage

Your profile is an executive summary. Use a clear photo and direct links to your LinkedIn and Substack. I’m not saying you need to become an “influencer”, but you need to make it easy for a peer to verify your background without hunting for it.

The Profile README needs to act as your technical thesis. Do not list every tool you have ever touched. State your philosophy plainly, such as “I build data platforms that prioritise reliability over hype“, or whatever statement resonates with you. If you need a template, see my GitHub Profile README Guide.

Then, use four pinned repositories to evaluate technical taste. These should show:

  • End-to-end Logic: Your most complex pipeline that shows how you handle data from ingestion to consumption.

  • Environment Management: Data engineering is engineering. Show your Terraform or Kubernetes configurations to prove you can manage the “plumbing“.

  • Development Efficiency: Your dotfiles. For senior roles, showing your CLI or environment setup proves you have mastered your tools.

  • Community Alignment: A fork of a tool like dbt or Airflow where you have actually pushed code. This signals you can work within a complex, shared codebase.

Authority comes from structure. By pinning these four, you constrain the conversation to your strengths and show a reviewer exactly what you consider “good” engineering.


What Repositories You Actually Need

To move past the noise, your work must demonstrate three specific levels of technical taste.

First:

You need Deep Expertise. This is about showing how you handled a specialised problem, like building custom Spark UDFs to handle complex business logic or designing a dbt modelling layer that scales. Seniority is about the ability to navigate the edges of a tool where the defaults fail.

Second:

You must treat Infrastructure as Engineering. Most data professionals treat the environment as someone else’s problem. You prove you are different by including Terraform or Kubernetes configurations. If you cannot automate your environment with Docker or CI/CD, you are a liability to a modern team. Your next best friend is consistency; it stands your ground against raw technical talent.

Third:

You need to show System Integration. I want to see a project that connects an API to S3, Snowflake, and dbt. This proves you understand the entire data lifecycle. You are responsible for the outcome of the entire pipeline, not just a single script.

But technical skill is the baseline. I am also looking for what actually sparks your interest. If your profile is filled with NYC Taxi datasets or Titanic survival rates, I assume you can only follow a syllabus.

This is tutorial slop. It proves you can copy code, but it does not prove you can solve a unique problem.

I want to see projects that show a builder mindset. Real engineering happens when you face data that was not cleaned for a competition. I once saw a Reddit post from somebody who built a pipeline to track their own bathroom visits. It sounds ridiculous, but it showed they knew how to collect, store, and visualise data they owned. They dealt with their own “production“ issues.

I built a pipeline for a Snowflake badge that pulled public info about ham radio repeaters. This is where you prove you can find a source, handle its specific weirdness, and build a system for it.

Your hobbies are the best source for this. Whether you track gaming stats, movie history, or fitness data, building a system for a reason is a green flag. It proves curiosity drives your work, not just the search for a certificate. When you build something you actually care about, you face real constraints. You have to decide on ingestion frequency, handle missing fields, and explain why the data matters.

If you are struggling for ideas, stop looking at generic tutorials. Focus on how an approach works in a real organisation. Check this playlist for more ideas: Level Up Data Engineering Playlist1.


Anatomy Of A Professional Repository

A repository is not just a folder for your scripts. It is a sales page for your engineering standards. If I have to hunt through your file structure to understand what the project does, I will simply close the tab. You need to make the signal undeniable.

Source

The README must lead with an architecture diagram. Use Mermaid.js2 or Excalidraw3 to show the data flow from source to destination. If I cannot see the system flow in five seconds, I assume you do not understand the system yourself. No diagram means no project.

Follow the diagram with the “Why”. I do not care that you used Spark, but I care about the business problem you solved.

State the impact plainly: “Reduced compute costs by 30%“ or “Automated a manual 4-hour reporting process“. Include details on scale, such as the number of rows processed and the frequency of the runs. This moves the conversation from mechanics to outcomes.

Engineering standards are where most portfolios fail. Your code must be clean, modular, and follow PEP84 or similar standards. Avoid “spaghetti SQL“ and massive, monolithic Python scripts. I look for a /tests folder in every repo. Including unit or integration tests proves you build for production. It shows you know how to break your own code before a user does.

Finally, check your commit hygiene. I do not want to see twenty commits titled “fix“, “update“, or “test“. Use conventional commits like Improve join logic for user_dim or Add Slack alerting for pipeline failures. This signals that you can work on a professional team where version history actually matters. If your commit history is a mess, I assume your production environment is too.


The Hidden Signals Of Seniority

I look for the scripts you wrote for yourself, not the ones you wrote for a grade. Hobby projects are green flags in a high-stakes market. A repository that automates your coffee machine or tracks a niche hobby shows more initiative than a generic ETL tutorial. These repositories prove you solve problems because you are a builder, not because you are chasing a certificate.

“Joke“ repositories are actually evidence of technical curiosity. They show you can learn adjacent technologies like APIs, Rust, or Go without a syllabus. This curiosity is what separates an operator from a task-taker. Engineering is often a solitary endeavour, but success in data depends on this drive to explore how things work.

There is a paradox in leadership roles. If you are a lead data engineer, I do not expect a portfolio of tutorials. I look for leadership through craft.

Your GitHub should show your “technical taste“ through your dotfiles or contributions to open-source projects. These signals prove you still understand the tools your team uses every day.

A visible builder identity provides a massive benefit when everyone looks the same on paper. It shows you have mastered your environment and value efficiency. In the 2026 market, I hire people who adapt. Your public work is the only way to prove you are one of them.


Final Thoughts

Landing a job in 2026 is a multicomponent sport. You cannot expect above-average results with average effort. The market has shifted from being supply-constrained to quality-constrained.

Landing a job is a data problem. If you are not getting calls, your CV is the problem. If you are not getting technical follow-ups after the interview, your GitHub is the problem. Fix the inputs to change the output.

The people I hire treat getting hired as a sport that requires training. They optimise their technical depth and personal branding at the same time. Success is about the combination of depth and a visible builder identity.

The market rewards those who learn from failure and treat their career as a product. Stop hiding behind a corporate title and start showing your taste. Visibility is the only way to prove you can handle the engineering part of the job.

The visible builder always wins.

1

https://www.datagibberish.com/t/playlist-level-up-data-engineering

2

https://mermaid.js.org/

3

https://excalidraw.com/

4

https://peps.python.org/pep-0008/

Yordan Ivanov's avatar
A guest post by
Yordan Ivanov
I share everything I learned becoming a Head of Data Engineering but nobody taught me. Playbooks, scripts, and templates on stakeholder management, career growth, and team leadership.
Subscribe to Yordan

No posts

© 2026 Erfan Hesami · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture