Python One Billion Row Challenge — From 10 Minutes to 4 Seconds
Article Image Bg

The question of how fast a programming language can go through and aggregate 1 billion rows of data has been gaining traction lately. Python, not being the most performant language out there, naturally doesn’t stand a chance — especially since the currently top-performing Java implementation takes only 1.535 seconds!

The fundamental rule of the challenge is that no external libraries are allowed. My goal for today is to start by obeying the rules, and then see what happens if you use external libraries and better-suited file formats.

I’ve run all the scripts 5 times and averaged the results.

As for the hardware, I’m using a 16" M3 Pro Macbook Pro with 12 CPU cores and 36 GB of RAM. Your results may vary if you decide to run the code, but hopefully, you should see similar percentage differences between implementations.