Show HN: Bodo – high-performance compute engine for Python data processing

github.com

11 points by ehsantn 10 days ago

Hello HN,

I’m excited to share Bodo, an open-source compute engine designed for large-scale data processing in native Python. Bodo is powered by an auto-parallelizing JIT compiler and an HPC backend, enabling it to generate highly optimized, parallel binaries (MPI) for Pandas and NumPy code—all without requiring any code rewrites.

Our latest benchmark demonstrates 20x to 240x speedup over traditional distributed computing frameworks like Spark, Ray, and Dask (code and details in repo).

The inspiration for Bodo came from my background in HPC, when I saw how extremely slow and hard to use Spark was (has gotten better over the years but still not great). Of course, a compiler has its own limitations (e.g. not all Python is compilable), but I think it’s leaps and bounds better.