Contributions

Article
Learn the variety of techniques you can use to make your Python application’s Docker image a whole lot smaller.
Article
A compiled language like Rust or C is a lot faster than Python, but it won’t always make your Python code faster. Learn about the hidden overhead you’ll need to overcome.
Tutorial
Pandas can easily load data using a SQL query, but the resulting dataframe may use too much memory. Learn how to process data in batches, and then how to reduce memory usage even further.
Article
Using old versions of pip can result in installing old packages, or needing to recompile packages from scratch. So make sure you upgrade pip before using it.
Library
A Python memory profiler for data processing and scientific computing applications
Article
Every time you change your pip requirements and rebuild your Docker image, you’re going to have download all your packages. Learn how to prevent this with Docker BuildKit’s new caching feature.
Article
There are many ways out-of-memory problems can manifest in Python. Learn how to identify them, as a first step to fixing the problem.
Tutorial
You want your application packaging to be reproducible, but you also want to be able to change dependencies easily without conflicts. Conda doesn’t make this easy, so learn how to do it with a third-party tool: conda-lock.
Article
To make your Python code faster, you should often start with optimizing single-threaded versions, then consider multiprocessing, and only then think about a cluster.
Article
Python 3.9 is out now, but when should you switch? Learn the problems you'll encounter, and when it's time to try it out.
Article
The official Python Docker image is useful, but to actually understand why, and to use it correctly, it’s worth understanding how exactly it’s constructed.
Article
There are a variety of ways of packaging your Python application for distribution, from wheels to Docker to PEX to Conda, and more. This article gives a survey of the different approaches, specifically focusing on distributing internal server applications.
Article
When your server is leaking memory, the Fil memory profiler can help you spot the buggy code.
Tutorial
Objects in Python have large memory overhead; create too many objects, and you’ll use far more memory than you expect. Learn why, and what do about it.
Article
Storing integers or floats in Python has a huge overhead in memory. Learn why, and how NumPy makes things better.
Tutorial
There are many reasons your code might fail to import in Docker. Here's a quick series of checks you can do to figure out the problem.
Article
Python will automatically release memory for objects that aren't being used. But sometimes function calls can unexpectedly keep objects in memory. Learn about Python memory management, how it interacts with function calls, and what you can do about it.
Tutorial
Debugging out-of-memory crashes can be tricky. Learn how the Fil memory profiler can help you find where your memory use is happening.
Article
Fil is a new memory profiler which shows you peak memory usage, and where that memory was allocated. It’s designed specifically for the needs of data scientists and scientists running data processing pipelines.
Tutorial
You don’t want to deploy insecure code to production—but it’s easy for mistakes and vulnerabilities to slip through. So you want some way to catch security issues automatically, without having to think about it.

For your code to your Python dependencies to the system packages you depend on, learn about some tools that will help you catch security vulnerabilities.
Tutorial
If your Docker build isn’t reproducible, a minor bug fix can spiral out of control into a series of unwanted and unnecessary major version upgrades.

There are multiple layers of reproducibility, from operating system to Python dependencies, so this article will cover how to deal with each.
Tutorial
Pandas code using too much memory, or running too slow? Processing your data in chunks lets you reduce memory usage, but it can also speed up your code. Because each chunk can be processed independently, you can process them in parallel, utilizing multiple CPUs. For Pandas (and NumPy), Dask is a great way to do this.
Tutorial
You have a large amount of data, and you want to load only part into memory as a Pandas dataframe. CSVs won’t cut it: you need a database, and the easiest way to do that is with SQLite.
Article
If you have a large array that doesn't fit in memory, you'll need to load it from disk—but how? This articles covers two strategies, mmap() and Zarr/HDF5, and the strengths and weaknesses of each.
Tutorial
Sometimes your data file is so large you can’t load it into memory at all, even with compression. So how do you process it quickly?

By loading and then processing the data in chunks, you can load only part of the file into memory at any given time. And that means you can process files that don’t fit in memory.

Learn how you can do this with Pandas.
Article
When you’re choosing a base image for your Docker image, Alpine Linux is often recommended. Using Alpine, you’re told, will make your images smaller and speed up your builds. And if you’re using Go that’s reasonable advice. For Python, Alpine is a bad idea.
Tool
The Conda packaging tool implements environments, that enable different applications to have different libraries installed. So when you’re building a Docker image for a Conda-based application, you’ll need to activate a Conda environment—a surprisingly tricky task.

Read this article to learn how the easy way to do it.
Tutorial
When you're processing large amounts of data in memory, copying wastes limited RAM, but mutating data in-place leads to bugs. There's a third solution, that gives safety while reducing memory usage: interior mutability.
Tutorial
If you’re running into memory issues because your NumPy arrays are too large, one of the basic approaches to reducing memory usage is compression. By changing how you represent your data, you can reduce memory usage and shrink your array’s footprint—often without changing the bulk of your code.
Tutorial
If you want to process a large amount data with Pandas, there are various techniques you can use to reduce memory usage without changing your data. But what if that isn’t enough? What if you still need to reduce memory usage?

Another technique you can try is lossy compression: drop some of your data in a way that doesn’t impact your final results too much. If parts of your data don’t impact your analysis, no need to waste memory keeping extraneous details around.

Showing the last 30 only...