Think of SQL optimisation as brewing the perfect cup of tea. If you use stale leaves or rush the process, you’ll end up with a bland or bitter result. Choosing the right file formats is like selecting the finest tea leaves while understanding your query is comparable to controlling the water temperature and steeping time with care. A well-optimised query, much like a perfectly brewed tea, brings out rich flavours effortlessly, leaving you with a cup that’s both satisfying and rejuvenating. 🍵✨
This week, we explored DuckDB's capabilities, showcasing its performance across various scenarios. Using practical examples, we demonstrated how techniques such as window functions, SARGable queries, aggregate functions, and CTEs can enhance query performance.
Day 57: Self Join vs. ROW_NUMBER()
Goal: To retrieve the most recent crime for each area.
Details:
Day 58: Repeated Subquery vs. CTE
Goal: To retrieve the total number of crimes per AREA_NAME and the percentage of all crimes for each area.
Details:
Day 59: CTE with Aggregation vs. Window Function
Goal: To compare CTE and Window Function for aggregation.
Details:
Day 60: DuckDB FILTER vs. PostgreSQL CASE
Goal: To retrieve the total sales for each product while including only those sales that exceed a specific threshold.
Details:
Day 61: COALESCE vs. IS NOT NULL
Goal: To retrieve the total sales while excluding rows with NULL values.
Details:
Day 62: Function-Based vs. Indexed Filters
Goal: To retrieve the total sales in February 2023.
Details:
Day 63: Self Join vs. LAG()
Goal: To calculate row-to-row differences for transactions.
Details:
Conclusion
In this week 9 of our "100 Days of SQL Optimisation" journey, we focused on advanced query optimisation techniques to further explore DuckDB's analytical capabilities:
Efficient Row Filtering with ROW_NUMBER(): Replaced self-joins with the
ROW_NUMBER()
window function to retrieve the most recent records for each category, drastically reducing execution time and memory usage.CTE for Repeated Aggregations: Utilised Common Table Expressions (CTEs) to calculate aggregated metrics once and reuse them, replacing repeated subqueries for better performance and clarity.
Advanced Filtering with DuckDB’s FILTER Clause: Demonstrated the efficiency of DuckDB’s
FILTER
clause over conditional aggregation for selectively summing data based on thresholds.Handling NULLs with IS NOT NULL: Showed how replacing
COALESCE
withIS NOT NULL
improves query efficiency when filtering out null values during aggregations.Function-Free Filtering for Index Utilisation: Replaced function-based filters like
EXTRACT
with direct range filters to leverage indexes, reducing execution time and memory usage.Row-to-Row Comparisons with LAG(): Replaced self-joins with the
LAG()
window function to efficiently calculate row-to-row differences, significantly reducing computational overhead.
If you're eager to learn more about making your SQL queries efficient and scalable, follow along with our 100-day journey. Subscribe to receive daily tips and unlock the full potential of your SQL skills!
If you haven’t checked our “100 Days of SQL Optimisation” before, you can start with this Week 1 recap and join our journey!
If you missed the Week 8 recap, you can catch up by checking it out here:
If you're looking to enhance your data modelling skills, check out:
We Value Your Feedback
If you have any feedback, suggestions, or additional topics you’d like us to cover, please share them with us. We’d love to hear from you!
Have you tried DuckDB yet? Whether it's for analytics, data science, or something entirely unique, we’re curious, how did you use it, and what was your experience like? Share your story and tips, we can’t wait to learn from you!