dbt in Action #3: Analyses, Materialisations and Incremental Models
Practicing dbt analyses, materialisations and incremental models with hands-on examples
This is the third post of dbt in Action series. Here's a quick look at what we’ve covered and what’s coming next:
Post 1: Get started with dbt: project setup and build your first transformations using staging, intermediate, and marts layers.
Post 2: A deep dive into dbt seeds, tests, and macros to enhance data validation and flexibility, with a hands-on example.
Post 3 (You're here!): Explore dbt analyses, understanding materialisations, and mastering incremental models for handling historical and high-volume data with a hands-on example.
Post 4: Using dbt snapshots to track slowly changing dimensions, with a hands-on example.
Post 5: Understanding the dbt Semantic Layer and introducing MetricFlow, with a hands-on example.
Note: You can find all the dbt implementation of the Hands-on Example here: [Github Repository]1
Missed earlier posts in the series? You can find them here2.
If you’re passionate about dbt
and want practical insights delivered straight to your inbox, hit that subscribe button so you won’t miss a thing!
Analyses in dbt serve a different purpose than models or tests. They're SQL files stored in the analyses
folder of your dbt project, allowing them to be version-controlled alongside your other dbt objects while not being directly part of your data transformation pipeline.
Key Characteristics
Version controlled: Stored with your models, tests, and macros.
Jinja support: You can use Jinja templating in analysis files.
Compile-only: View compiled SQL with
dbt compile
or "Compile SQL" in dbt Cloud.Not materialised: Unlike models, analyses don't create objects in your data warehouse.
Common Use Cases
Analyses serve several important purposes:
One-off queries: SQL files for ad-hoc queries against your data warehouse or scripts for user creation
Training queries: Demonstration files to show how dbt transforms Jinja and SQL into pure SQL or practice files for team members
Auditing/refactoring: Comparison queries to validate refactored models against existing ones before promotion to production.
Analyses provide a flexible way to include SQL that benefits from dbt's templating capabilities without needing to be materialised as tables or views in your data warehouse.
Example
Let’s say your team recently refactored a customer_orders
model. You want to compare the new version (customer_orders_refactored
) with the current production model to make sure the numbers match before deploying it.
You can create a SQL file in the analyses/
folder like this:
Then you run:
This lets you view the final SQL and test it manually without turning it into a table or view in your data warehouse.
Materialisations
Materialisations in dbt define how your SQL models are turned into objects in your data warehouse. Remember, a model in dbt is just a SQL SELECT
statement saved in a .sql
file. dbt takes care of wrapping that statement in the necessary DDL
/DML
and executing it appropriately based on the materialisation type you choose.
Let’s break down the five key types of materialisations:
1. Table
Your model is materialised as a physical table.
Each time you run
dbt run
, dbt will drop and recreate the entire table.Useful for stable datasets where you want fast query performance.
2. View
Your model is materialised as a SQL view in your data warehouse.
Views are virtual, they don’t store data but run the query every time you access them.
Great during development for faster build times and smaller storage footprint.
3. Ephemeral
No database object is created at all.
Instead, dbt inlines your model as a Common Table Expression (CTE) in any downstream models.
Ideal for lightweight transformations that don’t need to persist.
4. Incremental
dbt builds the model once and only appends new or updated records in future runs.
You define how dbt identifies new data, typically with a
unique_key
and/or awhere
clause.Perfect for large datasets where full refreshes would be inefficient.
5. Materialised View
This materialisation allows the creation and maintenance of materialised views in the target database.
Materialised views combine the benefits of both views and tables, serving similar purposes as incremental models.
Advice: Consider using materialised views when incremental models would work, but you want the data platform to manage incremental logic and refresh processes.
For more information about Materialisations, especially Materialised Views, check out the dbt documentation here.
Incremental Models
When handling large datasets that change frequently, rebuilding entire tables with each update can waste valuable computing resources and time. dbt's incremental models provide an elegant solution to this problem by selectively processing only new or modified records rather than the complete dataset.
What Are Incremental Models?
An incremental model in dbt:
Builds the full dataset on the first run.
On subsequent runs, it only adds (or updates) rows based on your defined logic.
This dramatically improves performance, especially with large, append-only or slowly changing datasets.
How Incremental Models Work?
dbt builds the initial table using your
SELECT
query.On future runs:
dbt compares the current state of source data.
Identifies new or changed records.
Inserts (or optionally updates/deletes) only those records.
Common Use Cases:
Daily transaction logs.
IoT sensor data.
Event streams.
Append-only data lakes.
Best Practices for Incremental Models:
Define a unique key (
unique_key
) to identify new records.Use a
where
clause to limit the scope of comparison.Run incrementals on a schedule that balances freshness vs. performance.
Combine this with
is_incremental()
logic to control behavior within SQL.
Example
The example demonstrates a typical dbt incremental model for processing order data:
Configuration Block: The
config()
macro sets up the model with:materialized='incremental'
: Tells dbt this is an incremental modelunique_key='order_id'
: Identifies records by their order IDincremental_strategy='merge'
: Uses a merge operation to update existing records
dbt offers five built-in core incremental strategies:
For more information about incremental strategies, supported adapters, and custom strategies, please refer to the documentation here:
Base Query: The SELECT statement pulls data from the orders table in the sales source.
Incremental Logic: The
is_incremental()
conditional:Only executes on subsequent runs (not the first run)
Adds a WHERE clause that filters for only records with an
updated_at
timestamp newer than the most recent one in the existing tableThis ensures that only new or updated orders are processed
When this model runs initially, it builds the complete orders table. On future runs, it only processes orders that have been added or modified since the last run, making it significantly more efficient for large datasets.
Should I use an incremental model in dbt?
Good candidates:
Immutable event streams: tall and skinny tables (lots of rows, few columns), and data is append-only (no updates).
If updates do happen, there’s a reliable
updated_at
field to track changes.
Not-so-great candidates:
Your data is small: incremental models are more useful with large datasets.
Your data structure changes often: e.g., new columns or renamed columns.
Your data is updated in unpredictable ways, making it hard to track what has changed.
Your transformation depends on comparing rows – for example, calculating values based on other rows in the same table or view.
Hands-On Example
For this hands-on example, we will create a dbt using staging, intermediate, and marts layers. The project will leverage dbt Analyses, views, tables, and an incremental model.
Scenario
We work for an e-commerce company that wants to track customer behaviour and product performance. The goals include:
Understanding customer lifetime value and ordering behaviour
Analysing product-level sales and revenue
Creating a scalable pipeline using views, tables, and incremental logic
Ensuring data quality through schema tests
Exploring ad hoc insights via analysis queries
The raw data comes from four sources:
Customers Table (
raw.raw_customers
): Customer detailsOrders Table (
raw.raw_orders
): One record per orderOrder Items Table (
raw.raw_order_items
): Products included in each orderProducts Table (
raw.raw_products
): Metadata about the products
Data Transformation Layers
To convert raw CSV data into insights, we follow a structured layered approach:
Staging Layer (stg_*
)
Cleans and standardises raw source data
Applies naming conventions, basic formatting, and date casting
Enriches with reference data if needed
Models:
stg_customers.sql
stg_orders.sql
stg_order_items.sql
stg_products.sql
Intermediate Layer (int_*
)
Applies business logic like joins, aggregations, and calculations
Provides reusable and modular building blocks for analytics
Uses CTE-based structuring for clarity
Models:
int_customer_orders.sql
: Enriches orders with customer info
int_order_item_details.sql
: Enriches order items with product info and computes item-level totals
Marts Layer (mart_*
)
Final layer used for dashboards and analytics tools
Fact and summary models using either table or incremental strategies
Optimised for performance and end-user consumption
Models:
mart_customer_lifetime_value.sql
(table): Summarises total spend, order count, first/last order for each customer
mart_product_sales_summary.sql
(incremental): Tracks quantity and revenue per product using incremental logic with amerge
strategy
Analyses (/analyses
)
training_query.sql
churn_investigation_oneoff.sql
Data Model Breakdown
You can find all the code and setup instructions in the GitHub repository:
The materialisation configurations are defined in dbt_project.yml
as;
models:
dbt_post_3:
staging:
+materialized: view
intermediate:
+materialized: view
marts:
customer:
mart_customer_lifetime_value:
+materialized: table
sales:
mart_product_sales_summary:
+materialized: incremental
Running the Full dbt Pipeline
Output:
Then, we can run the tests.
Expected Output:
Additionally, if we want to test our queries in /analyses/
We can simply compile our project:
This will compile the queries and put them with the exact DB table references inside the directory /target/compiled/dbt_post_3/analyses/
so that we can easily copy them and run them against our models.
Conclusion
We have built a transformation pipeline that follows best practices by leveraging dbt analyses, materialisation, and incremental models.
By following best practices, we:
Used staging models to clean and standardise raw inputs.
Built intermediate models using modular CTEs to apply business logic
Delivered analytics-ready marts with a mix of table and incremental materialisations
Leveraged dbt analyses for ad hoc and training queries (all version-controlled and lineage-aware)
Materialisations in dbt gave us full flexibility to optimise performance and manage storage effectively. The incremental strategy helped us handle updates efficiently without reprocessing full datasets.
With this foundation, you now have a blueprint for building robust, scalable, and maintainable transformation pipelines that are ready to support production-grade analytics and insights.
In the next post, we’ll explore dbt Snapshots and Slowly Changing Dimension (SCD Type-2) implementation in dbt to efficiently handle historical transformations.
Preparing for Data Engineering interview questions? Check out our series, with a special focus on dbt preparation here:
Considering implementing a data quality framework with dbt? Check out this post:
Lastly, looking to improve your SQL skills? Start with our SQL optimisation series. Here’s the first week of our popular series:
We Value Your Feedback
If you have any feedback, suggestions, or additional topics you’d like us to cover, please share them with us. We’d love to hear from you!
https://github.com/pipelinetoinsights/dbt-example-3
https://pipeline2insights.substack.com/t/dbt-series
Do you see loads of marts/fact tables nowadays? I feel like most orgs are leaning into using semantic layers and calculating KPIs on the fly.