Week #9: 100 Days of SQL Optimisation

Optimising query performances in DuckDB

Jan 16, 2025

Think of SQL optimisation as brewing the perfect cup of tea. If you use stale leaves or rush the process, you’ll end up with a bland or bitter result. Choosing the right file formats is like selecting the finest tea leaves while understanding your query is comparable to controlling the water temperature and steeping time with care. A well-optimised query, much like a perfectly brewed tea, brings out rich flavours effortlessly, leaving you with a cup that’s both satisfying and rejuvenating. 🍵✨

This week, we explored DuckDB's capabilities, showcasing its performance across various scenarios. Using practical examples, we demonstrated how techniques such as window functions, SARGable queries, aggregate functions, and CTEs can enhance query performance.

Day 57: Self Join vs. ROW_NUMBER()

Goal: To retrieve the most recent crime for each area.

Details:

Day 58: Repeated Subquery vs. CTE

Goal: To retrieve the total number of crimes per AREA_NAME and the percentage of all crimes for each area.

Details:

Day 59: CTE with Aggregation vs. Window Function

Goal: To compare CTE and Window Function for aggregation.

Details:

Day 60: DuckDB FILTER vs. PostgreSQL CASE

Goal: To retrieve the total sales for each product while including only those sales that exceed a specific threshold.

Details:

Day 61: COALESCE vs. IS NOT NULL

Goal: To retrieve the total sales while excluding rows with NULL values.

Details:

Day 62: Function-Based vs. Indexed Filters

Goal: To retrieve the total sales in February 2023.

Details:

Day 63: Self Join vs. LAG()

Goal: To calculate row-to-row differences for transactions.

Details:

Conclusion

In this week 9 of our "100 Days of SQL Optimisation" journey, we focused on advanced query optimisation techniques to further explore DuckDB's analytical capabilities:

Efficient Row Filtering with ROW_NUMBER(): Replaced self-joins with the ROW_NUMBER() window function to retrieve the most recent records for each category, drastically reducing execution time and memory usage.
CTE for Repeated Aggregations: Utilised Common Table Expressions (CTEs) to calculate aggregated metrics once and reuse them, replacing repeated subqueries for better performance and clarity.
Advanced Filtering with DuckDB’s FILTER Clause: Demonstrated the efficiency of DuckDB’s FILTER clause over conditional aggregation for selectively summing data based on thresholds.
Handling NULLs with IS NOT NULL: Showed how replacing COALESCE with IS NOT NULL improves query efficiency when filtering out null values during aggregations.
Function-Free Filtering for Index Utilisation: Replaced function-based filters like EXTRACT with direct range filters to leverage indexes, reducing execution time and memory usage.
Row-to-Row Comparisons with LAG(): Replaced self-joins with the LAG() window function to efficiently calculate row-to-row differences, significantly reducing computational overhead.

If you're eager to learn more about making your SQL queries efficient and scalable, follow along with our 100-day journey. Subscribe to receive daily tips and unlock the full potential of your SQL skills!

If you haven’t checked our “100 Days of SQL Optimisation” before, you can start with this Week 1 recap and join our journey!

Week #1: 100 Days of SQL Optimisation

Pipeline to Insights

November 7, 2024

Read full story

If you missed the Week 8 recap, you can catch up by checking it out here:

Week #8: 100 Days of SQL Optimisation

Pipeline to Insights

Jan 7

Read full story

If you're looking to enhance your data modelling skills, check out:

Data Modelling Fundamentals: Normalisation, 3NF and Dimensional Modelling

Pipeline to Insights

Jan 11

Read full story

We Value Your Feedback

If you have any feedback, suggestions, or additional topics you’d like us to cover, please share them with us. We’d love to hear from you!

Have you tried DuckDB yet? Whether it's for analytics, data science, or something entirely unique, we’re curious, how did you use it, and what was your experience like? Share your story and tips, we can’t wait to learn from you!

Pipeline To Insights

Week #1: 100 Days of SQL Optimisation

Week #8: 100 Days of SQL Optimisation

Data Modelling Fundamentals: Normalisation, 3NF and Dimensional Modelling

Discussion about this post