7 Comments
User's avatar
Hoyt Emerson's avatar

Certainly telling us Polars believers what we already know, but this was a great breakdown. Pandas has certain functions that sometimes beats it out (date times come to mind), but I honestly can’t imagine using Pandas ever again.

Expand full comment
Emmanuel GUENOU's avatar

Hi, thanks for the great work. Is there a plan to compare Polars to PySpark?

Expand full comment
Pipeline to Insights's avatar

Yeah for sure I’m expanding it to duckdb and Pyspark and share the outcome soon. Would you like to see the benchmarking using same size of data (around 1GB).?

Expand full comment
Emmanuel GUENOU's avatar

Yeah, 1GB sounds good as a starting point and will definitely give interesting results. Just a note though: Spark is really optimized for much larger scales (100s of GB to TB). At 1GB, everything still fits comfortably in memory, so frameworks like DuckDB and Polars will likely shine since Spark’s overhead (scheduler, shuffle, serialization) will dominate.

For context, Spark usually targets partition sizes around 128MB, so 1GB ends up being only ~8 partitions – not really stressing the engine. It could be great to also include a larger dataset later (say 50–100GB) to highlight the scaling differences and where Spark starts to show its strengths.

Expand full comment
Pipeline to Insights's avatar

Thank you so much for your feedback💐😊

Expand full comment
Ro's avatar
Jul 22Edited

Great post, Erfan! Just a quick tip, instead of `time.time()` use `time.perf_counter()`. The `time.time()` method tells the current time of your device whereas `time.perf_counter` uses relative time [Ref1]. From [Ref2]:

"It's measured (perf_counter) using a CPU counter and, as specified in the docs, should only be used to measure time intervals"

Ref1: https://blog.dailydoseofds.com/p/dont-use-timetime-to-measure-execution

Ref2: https://stackoverflow.com/questions/66036844/time-time-or-time-perf-counter-which-is-faster

Expand full comment
Pipeline to Insights's avatar

Thank you so much Ro for this great tip

Expand full comment