dbt in Action #3: Analyses, Materialisations and Incremental Models

Practicing dbt analyses, materialisations and incremental models with hands-on examples

Apr 09, 2025

This is the third post of dbt in Action series. Here's a quick look at what we’ve covered and what’s coming next:

Post 1: Get started with dbt: project setup and build your first transformations using staging, intermediate, and marts layers.
Post 2: A deep dive into dbt seeds, tests, and macros to enhance data validation and flexibility, with a hands-on example.
Post 3 (You're here!): Explore dbt analyses, understanding materialisations, and mastering incremental models for handling historical and high-volume data with a hands-on example.
Post 4: Using dbt snapshots to track slowly changing dimensions, with a hands-on example.
Post 5: Understanding the dbt Semantic Layer and introducing MetricFlow, with a hands-on example.

Note: You can find all the dbt implementation of the Hands-on Example here: [Github Repository]1

Missed earlier posts in the series? You can find them here2.

If you’re passionate about dbt and want practical insights delivered straight to your inbox, hit that subscribe button so you won’t miss a thing!

Analyses

Analyses in dbt serve a different purpose than models or tests. They're SQL files stored in the analyses folder of your dbt project, allowing them to be version-controlled alongside your other dbt objects while not being directly part of your data transformation pipeline.

Key Characteristics

Version controlled: Stored with your models, tests, and macros.
Jinja support: You can use Jinja templating in analysis files.
Compile-only: View compiled SQL with dbt compile or "Compile SQL" in dbt Cloud.
Not materialised: Unlike models, analyses don't create objects in your data warehouse.

Common Use Cases

Analyses serve several important purposes:

One-off queries: SQL files for ad-hoc queries against your data warehouse or scripts for user creation
Training queries: Demonstration files to show how dbt transforms Jinja and SQL into pure SQL or practice files for team members
Auditing/refactoring: Comparison queries to validate refactored models against existing ones before promotion to production.

Analyses provide a flexible way to include SQL that benefits from dbt's templating capabilities without needing to be materialised as tables or views in your data warehouse.

Example

Let’s say your team recently refactored a customer_orders model. You want to compare the new version (customer_orders_refactored) with the current production model to make sure the numbers match before deploying it.

You can create a SQL file in the analyses/ folder like this:

Then you run:

This lets you view the final SQL and test it manually without turning it into a table or view in your data warehouse.

Materialisations

Materialisations in dbt define how your SQL models are turned into objects in your data warehouse. Remember, a model in dbt is just a SQL SELECT statement saved in a .sql file. dbt takes care of wrapping that statement in the necessary DDL/DML and executing it appropriately based on the materialisation type you choose.

Let’s break down the five key types of materialisations:

1. Table

Your model is materialised as a physical table.
Each time you run dbt run, dbt will drop and recreate the entire table.
Useful for stable datasets where you want fast query performance.

2. View

Your model is materialised as a SQL view in your data warehouse.
Views are virtual, they don’t store data but run the query every time you access them.
Great during development for faster build times and smaller storage footprint.

3. Ephemeral

No database object is created at all.
Instead, dbt inlines your model as a Common Table Expression (CTE) in any downstream models.
Ideal for lightweight transformations that don’t need to persist.

4. Incremental

dbt builds the model once and only appends new or updated records in future runs.
You define how dbt identifies new data, typically with a unique_key and/or a where clause.
Perfect for large datasets where full refreshes would be inefficient.

5. Materialised View

This materialisation allows the creation and maintenance of materialised views in the target database.
Materialised views combine the benefits of both views and tables, serving similar purposes as incremental models.

Advice: Consider using materialised views when incremental models would work, but you want the data platform to manage incremental logic and refresh processes.

For more information about Materialisations, especially Materialised Views, check out the dbt documentation here.

Incremental Models

When handling large datasets that change frequently, rebuilding entire tables with each update can waste valuable computing resources and time. dbt's incremental models provide an elegant solution to this problem by selectively processing only new or modified records rather than the complete dataset.

What Are Incremental Models?

An incremental model in dbt:

Builds the full dataset on the first run.
On subsequent runs, it only adds (or updates) rows based on your defined logic.

This dramatically improves performance, especially with large, append-only or slowly changing datasets.

How Incremental Models Work?

dbt builds the initial table using your SELECT query.
On future runs:
- dbt compares the current state of source data.
- Identifies new or changed records.
- Inserts (or optionally updates/deletes) only those records.

Common Use Cases:

Daily transaction logs.
IoT sensor data.
Event streams.
Append-only data lakes.

Best Practices for Incremental Models:

Define a unique key (unique_key) to identify new records.
Use a where clause to limit the scope of comparison.
Run incrementals on a schedule that balances freshness vs. performance.
Combine this with is_incremental() logic to control behavior within SQL.

Example

The example demonstrates a typical dbt incremental model for processing order data:

Configuration Block: The config() macro sets up the model with:
- materialized='incremental': Tells dbt this is an incremental model
- unique_key='order_id': Identifies records by their order ID
- incremental_strategy='merge': Uses a merge operation to update existing records

dbt offers five built-in core incremental strategies:

For more information about incremental strategies, supported adapters, and custom strategies, please refer to the documentation here:

Base Query: The SELECT statement pulls data from the orders table in the sales source.
Incremental Logic: The is_incremental() conditional:
- Only executes on subsequent runs (not the first run)
- Adds a WHERE clause that filters for only records with an updated_at timestamp newer than the most recent one in the existing table
- This ensures that only new or updated orders are processed

When this model runs initially, it builds the complete orders table. On future runs, it only processes orders that have been added or modified since the last run, making it significantly more efficient for large datasets.

Should I use an incremental model in dbt?

Good candidates:

Immutable event streams: tall and skinny tables (lots of rows, few columns), and data is append-only (no updates).
If updates do happen, there’s a reliable updated_at field to track changes.

Not-so-great candidates:

Your data is small: incremental models are more useful with large datasets.
Your data structure changes often: e.g., new columns or renamed columns.
Your data is updated in unpredictable ways, making it hard to track what has changed.
Your transformation depends on comparing rows – for example, calculating values based on other rows in the same table or view.

Hands-On Example

For this hands-on example, we will create a dbt using staging, intermediate, and marts layers. The project will leverage dbt Analyses, views, tables, and an incremental model.

Scenario

We work for an e-commerce company that wants to track customer behaviour and product performance. The goals include:

Understanding customer lifetime value and ordering behaviour
Analysing product-level sales and revenue
Creating a scalable pipeline using views, tables, and incremental logic
Ensuring data quality through schema tests
Exploring ad hoc insights via analysis queries

The raw data comes from four sources:

Customers Table (raw.raw_customers): Customer details
Orders Table (raw.raw_orders): One record per order
Order Items Table (raw.raw_order_items): Products included in each order
Products Table (raw.raw_products): Metadata about the products

Data Transformation Layers

To convert raw CSV data into insights, we follow a structured layered approach:

Staging Layer (`stg_*`)

Cleans and standardises raw source data
Applies naming conventions, basic formatting, and date casting
Enriches with reference data if needed

Models:

stg_customers.sql

stg_orders.sql

stg_order_items.sql

stg_products.sql

Intermediate Layer (`int_*`)

Applies business logic like joins, aggregations, and calculations
Provides reusable and modular building blocks for analytics
Uses CTE-based structuring for clarity

Models:

int_customer_orders.sql: Enriches orders with customer info

int_order_item_details.sql: Enriches order items with product info and computes item-level totals

Marts Layer (`mart_*`)

Final layer used for dashboards and analytics tools
Fact and summary models using either table or incremental strategies
Optimised for performance and end-user consumption

Models:

mart_customer_lifetime_value.sql (table): Summarises total spend, order count, first/last order for each customer

mart_product_sales_summary.sql (incremental): Tracks quantity and revenue per product using incremental logic with a merge strategy

Analyses (`/analyses`)

training_query.sql

churn_investigation_oneoff.sql

Data Model Breakdown

You can find all the code and setup instructions in the GitHub repository:

The materialisation configurations are defined in dbt_project.yml as;

models:
  dbt_post_3:
    staging:
      +materialized: view
    intermediate:
      +materialized: view
    marts:
      customer:
        mart_customer_lifetime_value:
          +materialized: table
      sales:
        mart_product_sales_summary:
          +materialized: incremental

Running the Full dbt Pipeline

Output:

Then, we can run the tests.

Expected Output:

Additionally, if we want to test our queries in /analyses/We can simply compile our project:

This will compile the queries and put them with the exact DB table references inside the directory /target/compiled/dbt_post_3/analyses/ so that we can easily copy them and run them against our models.

Conclusion

We have built a transformation pipeline that follows best practices by leveraging dbt analyses, materialisation, and incremental models.

By following best practices, we:

Used staging models to clean and standardise raw inputs.
Built intermediate models using modular CTEs to apply business logic
Delivered analytics-ready marts with a mix of table and incremental materialisations
Leveraged dbt analyses for ad hoc and training queries (all version-controlled and lineage-aware)

Materialisations in dbt gave us full flexibility to optimise performance and manage storage effectively. The incremental strategy helped us handle updates efficiently without reprocessing full datasets.

With this foundation, you now have a blueprint for building robust, scalable, and maintainable transformation pipelines that are ready to support production-grade analytics and insights.

In the next post, we’ll explore dbt Snapshots and Slowly Changing Dimension (SCD Type-2) implementation in dbt to efficiently handle historical transformations.

Preparing for Data Engineering interview questions? Check out our series, with a special focus on dbt preparation here: