Centralised Orchestration in Dagster Using Code Locations

How to manage independent projects under one Dagster instance

Jul 13, 2025

In previous posts of our Data Engineering Interview Preparation Series1, we explored data orchestration and workflows and implemented hands-on projects using Dagster2. If you missed them, you can catch up here:

Week 16/34: Data Pipelines and Workflow Orchestration for Data Engineering Interviews (Part #1)

Erfan Hesami

Apr 1

Read full story

Week 17/34: Data Pipelines and Workflow Orchestration for Data Engineering Interviews (Part #2)

Erfan Hesami

Apr 12

Read full story

Week 18/34: Data Pipelines and Workflow Orchestration for Data Engineering Interviews (Part #3)

Erfan Hesami

Apr 21

Read full story

Now, we’re going to take things a step further and dive deeper into Dagster by looking at the code locations concept.

Let’s start with this scenario:

Imagine we're managing two independent pipelines:

They could be separate projects.
Or owned by different teams.
Or built with different environments or dependencies.

And we're responsible for orchestrating these workflows using Dagster. What would we do? Do we create separate Dagster servers? What if we keep them separate but still manageable from a central location and a single Dagster server?

That’s where code locations come in. They allow us to:

Isolate codebases.
Manage them independently.
Still operate everything under one Dagster web server.

The code and project files are available in the repository here3.

Our example project includes two dlt4 pipelines handling two distinct data sources:

Bluesky Social Media Posts: Collecting public posts around fuel prices.
7-Eleven Fuel Price API: Gathering near real-time fuel price data from Australian gas stations.

Both pipelines load data into a PostgreSQL database and are scheduled to run at different frequencies.

If you are not familiar with dlt, a Python data ingestion library, please check out the introduction post here:

Introduction to data load tool (dlt): A Python Library for Simple Data Ingestion

Erfan Hesami

December 1, 2024

Read full story

But before jumping into implementation, let’s quickly define a few important Dagster concepts that you'll encounter throughout the setup.

Assets

Assets in Dagster represent the data objects that our pipelines create, update, or manage. Think of it as any piece of data that our code produces and saves somewhere, whether that's a CSV file on disk, a database table, or a trained ML model. This asset-centric approach allows us to focus on what we're building, not just how we're building it.

Common examples of assets include:

Database table or view
File (local machine or AWS S3)
A machine learning model

To define an asset in Dagster, we use the @asset decorator from the dagster module.

Since we use dlt with Dagster, we use the @dlt_assets decorator. It converts our dlt data pipeline into Dagster assets, making it easy to manage and orchestrate within the Dagster framework. If you'd like to learn more about how Dagster integrates with dlt, check out the official documentation.

Definitions

The definitions serves as the central manifest of your Dagster project. It contains an Definitions object that brings together everything Dagster needs to understand, operate, and orchestrate our data pipeline, including assets, jobs, schedules, sensors, and resources.

In short, the Definitions object tells Dagster:

“Here’s everything that belongs to this project and how it all fits together.”

Example:

Let’s say we’re building a pipeline to process customer data. Our assets might include:

Raw customer records
Cleaned customer profiles
A customer insights report

Our definitions.py would specify:

What each of those assets is
How they depend on one another
When and how they should be executed (via jobs, schedules, or sensors)

By bundling all this into a single Definitions object, Dagster gets a complete, connected view of our pipeline: what to run, when to run it, and how it all flows together.

Schedules

Schedules are like timers for our data pipeline; they automatically trigger jobs to run at specific times or intervals. Think of them as setting an alarm clock for our workflows.

With schedules, we can, for example:

Run a daily sales report every morning at 6 AM.
Update the customer database every hour.
Train a machine learning model every Sunday at midnight.

They help us automate recurring tasks so our pipelines stay fresh without manual intervention.

Jobs

Jobs are collections of assets or operations (ops) that execute together as a single unit of work. Think of a job as a "batch of work"; it defines what should run, in what order, and how.

Jobs bring together the logic of our pipeline into a defined execution plan.

Examples:

A job might be performed:
Fetch data from an API, clean the data, and load it into the database.
Or:
Train a machine learning model, validate the results, and deploy the model

Jobs are essential for organising reusable workflows and can be triggered manually, on a schedule, or in response to events (via sensors5).

Code locations

Code locations in Dagster let us split our project code into independent units (like folders or repositories). Each unit has its own environment, dependencies, and workflows, but they are all still managed from a single Dagster deployment.

Hands-on example:

In this example, we have two independent Dagster projects:

bluesky_project: Ingests data from the Bluesky API and is scheduled to run weekly.
seven_eleven_project: Ingests data from the 7-Eleven API and is scheduled to run daily.

Each project contains its own:

pipeline script
Assets
Jobs
Schedules
Definitions

We’ve placed these in separate folders inside a central repository named company_orchestrator/.

You can find the repository for this implementation here6.

How does Dagster know where our projects are?

A workspace.yaml7 is used to configure code locations in Dagster. It tells Dagster where to find our code and how to load it

Also, We require Central definitions.py at the root of the repository (company_orchestrator/) to merge the Definitions from both projects.

The definitions.py file at the root of your project is intended to combine the Dagster definitions from both the bluesky_project and seven_eleven_project modules. This allows us to orchestrate and manage assets, schedules, and resources from both projects in a single Dagster instance.

Key Benefits:

Centralised orchestration: We can run, schedule, and monitor all assets and jobs from both projects in one Dagster UI.

Modular development: Each project can be developed and tested independently, but deployed together.

Scalability: Easy to add more projects or data sources in the future by merging their definitions

Project Flow Example:

To run the project, we can simply run :

dagster dev -w workspace.yaml

Command Breakdown

dagster dev: Launches Dagster's development server (Dagit UI + daemon services8)
-w workspace.yaml: Specifies the workspace configuration file to use

Example Flow in our Project

To run the project, we can simply run:

dagster dev -w workspace.yaml

dagster dev: Launches Dagster's development server (Dagit UI + daemon services)
-w workspace.yaml: Specifies the workspace configuration file to use

Dagster reads workspace.yaml:

Finds bluesky_project.definitions and seven_eleven_project.definitions.

Dagster loads the Definitions objects from those modules.

Dagster UI shows us all the assets, jobs, and schedules defined in those modules.

Conclusion

In this post, we learned how to use code locations in Dagster to manage multiple independent pipelines, each with its own assets, jobs, and schedules, while still controlling everything from a single Dagster instance.

By using workspace.yaml and a central definitions.py, we can keep projects modular, easy to maintain, and ready to scale.

This setup is ideal for teams working on different data workflows but wanting a unified orchestration system.

We Value Your Feedback

If you have any feedback, suggestions, or additional topics you’d like us to cover, please share them with us. We’d love to hear from you!

Pipeline To Insights

Week 16/34: Data Pipelines and Workflow Orchestration for Data Engineering Interviews (Part #1)

Week 17/34: Data Pipelines and Workflow Orchestration for Data Engineering Interviews (Part #2)

Week 18/34: Data Pipelines and Workflow Orchestration for Data Engineering Interviews (Part #3)

Introduction to data load tool (dlt): A Python Library for Simple Data Ingestion

Discussion about this post