Contributions

Article
Numba can make your numeric code faster, but only if you use it right.
Article
If you’re doing computations on a GPU, NVIDIA is the default, alongside its CUDA libraries. But NVIDIA-specific sofware it won't run on Macs, in CI, or on other GPUs. What can you do if you want to use GPUs in a portable manner? In this article we’ll cover one option, the wgpu-py library.
Article
Learn how to use the Profila profiler to find performance bottlenecks in your Numba code.
Article
Do you use NumPy, Pandas, or scikit-learn and want to get faster results? Nvidia has created GPU-based replacements for each of these with the shared promise of extra speed. Unfortunately, while those speed-ups are impressive, they are also misleading. GPU-based libraries might be the answer to your performance problems… or they might be an an unnecessary and expensive distraction.
Article
NumPy 2 is coming, and it’s got some backwards incompatible changes. Learn how to keep your code from breaking, and how to upgrade.
Article
Figuring out how much parallelism your program can use is surprisingly tricky.
Article
Pandas has far more third-party integrations than Polars. Learn how to use those libraries with Polars dataframes.
Article
When you’re doing large scale data processing with Python, threads are a good way to achieve parallelism. This is especially true if you’re doing numeric processing, where the global interpreter lock (GIL) is typically not an issue. And if you’re using threading, thread pools are a good way to make sure you don’t use too many resources.

But how many threads should your thread pool have? And do you need just one thread pool, or more than one?
Tutorial
SIMD is a CPU feature that lets you speed up numeric processing; learn how to use it with Cython.
Article
Rust can make your Python code much faster; here’s how to start using it as quickly as possible.
Article
What do you do when your NumPy code isn’t fast enough? We’ll discuss the options, from Numba to JAX to manual optimizations.
Article
With a little understanding of how CPUs and compilers work, you can speed up NumPy with faster Numba code.
Article
CSV, JSON, Parquet—which file format should you use for data being processed by Pandas?
Article
You’re on a new version of Linux, you try a pip install, and it errors out, talking about “externally managed environments” and “PEP 668”. What’s going on? How do you solve this?
Article
Ruff is a new, much faster linter for Python, to help you catching bugs without waiting forever for CI.
Article
Initial and exploratory data analysis have different requirements than production data processing; Polars supports both.
Article
While multiprocessing allows Python to scale to multiple CPUs, it has some performance overhead compared to threading.
Article
Estimating Pandas memory usage from the data file size is surprisingly difficult. Learn why, and some alternative approaches that don’t require estimation.
Article
Switching from float64 (double-precision) to float32 (single-precision) can cut memory usage in half. But how do you deal with data that doesn’t fit?
Article
If you need to speed up Python, Cython is a very useful tool. It lets you seamlessly merge Python syntax with calls into C or C++ code, making it easy to write high-performance extensions with rich Python interfaces.

That being said, Cython is not the best tool in all circumstances. So in this article I’ll go over some of the limitations and problems with Cython, and suggest some alternatives.
Article
While Polars is mostly known for running faster than Pandas, if you use it right it can sometimes also significantly reduce memory usage compared to Pandas. In particular, certain techniques that you need to do manually in Pandas can be done automatically in Polars, allowing you to process large datasets without using as much memory—and with less work on your side!
Article
Python 3.7 end of life is in 6 months; after that there will be no more security updates. So the time to upgrade is now.
Article
The libraries you’re using might be running more threads than you realize—and that can mean slower execution.
Article
Python 3.11 is out now–but should you switch to it immediately? And if you shouldn’t upgrade just yet, when should you?
Article
Your data processing jobs are fast… most of the time. Next, find the slow runs so you can speed them up.
Article
Learn a variety of—sometimes horrible—ways to instrument and measure performance in Python.
Article
Vectorization is a great way to speed up your Python code, but you’re limited to specific operations on bulk data. Learn how to get pass these limitations.
Article
Learn how to speed up your Celery tasks by identifying slow tasks, and then finding the performance bottleneck using a profiler.
Article
Vectorization in Pandas can make your code faster—except when it will make your code slower.
Article
Installing packages with pip, Poetry, and Pipenv can be slow. Learn how to ensure it’s not even slower, and a potential speed-up.

Showing the last 30 only...