Contributions

Tutorial
You have a large amount of data, and you want to load only part into memory as a Pandas dataframe. CSVs won’t cut it: you need a database, and the easiest way to do that is with SQLite.
Article
If you have a large array that doesn't fit in memory, you'll need to load it from disk—but how? This articles covers two strategies, mmap() and Zarr/HDF5, and the strengths and weaknesses of each.
Tutorial
Sometimes your data file is so large you can’t load it into memory at all, even with compression. So how do you process it quickly?

By loading and then processing the data in chunks, you can load only part of the file into memory at any given time. And that means you can process files that don’t fit in memory.

Learn how you can do this with Pandas.
Article
When you’re choosing a base image for your Docker image, Alpine Linux is often recommended. Using Alpine, you’re told, will make your images smaller and speed up your builds. And if you’re using Go that’s reasonable advice. For Python, Alpine is a bad idea.
Tool
The Conda packaging tool implements environments, that enable different applications to have different libraries installed. So when you’re building a Docker image for a Conda-based application, you’ll need to activate a Conda environment—a surprisingly tricky task.

Read this article to learn how the easy way to do it.
Tutorial
When you're processing large amounts of data in memory, copying wastes limited RAM, but mutating data in-place leads to bugs. There's a third solution, that gives safety while reducing memory usage: interior mutability.
Tutorial
If you’re running into memory issues because your NumPy arrays are too large, one of the basic approaches to reducing memory usage is compression. By changing how you represent your data, you can reduce memory usage and shrink your array’s footprint—often without changing the bulk of your code.
Tutorial
If you want to process a large amount data with Pandas, there are various techniques you can use to reduce memory usage without changing your data. But what if that isn’t enough? What if you still need to reduce memory usage?

Another technique you can try is lossy compression: drop some of your data in a way that doesn’t impact your final results too much. If parts of your data don’t impact your analysis, no need to waste memory keeping extraneous details around.
Article
You’re writing software that processes data, and it works fine when you test it on a small sample file. But when you load the real data, your program crashes. The problem is that you don’t have enough memory—if you have 16GB of RAM, you can’t load a 100GB file.

So how do you process your data with just one computer?

In this article you'll learn the basic techniques to help you process more data that fits in RAM.
Article


Your Python program is too slow. Maybe your web application can’t keep up, or certain queries are taking a long time. Maybe you have a batch program that takes hours or even days to run.

To speed it up, you need a tool that will help you figure where the slowness is coming from. This article will cover some good starting points for choosing the right tool.
Tutorial
If your program is slow, and it's not CPU, how do you pinpoint the problem? In this article you'll learn how to write custom profilers to find places where your program is waiting.
Tutorial
Sometimes your Python process misbehaves; wouldn't it be nice to have an interpreter prompt inside your process to help you debug what's going on? Learn how to do so with Manhole.
Tutorial
It’s tempting to migrate your database schema when your application container starts up—here’s some reasons to rethink that choice.
Tutorial
Installing dependencies separately from your code allows you to take advantage of Docker's layer caching to speed up your build. Here's how to do it with pipenv, poetry, or pip-tools.
Article
When it’s time to package up your Python application into a Docker image, the natural thing to do is search the web for some examples. And a quick search will provide you with plenty of simple, easy examples.

Unfortunately, these simple, easy examples are often broken in a variety of ways, some obvious, some less so.
Tutorial


Gunicorn is a common WSGI server for Python applications, but most Docker images that use it are badly configured. Running in a container isn’t the same as running on a virtual machine or physical server, and there are also Linux-environment differences to take into account.

So to keep your Gunicorn setup healthy and happy, in this article I’ll cover what you how to configure it.
Tutorial
There are many choices for a base Docker image for your Python application—this article will give you some criteria to base your decision on, walk you through some of the options, and make some suggestions that should work for most people.
Tutorial
The more you use JSON, the more likely you are to encounter JSON encoding or decoding as a bottleneck. Python’s built-in library isn’t bad, but there are multiple faster JSON libraries available: how do you choose which one to use? This article will share a process you can use to choose the best library for your needs.
Tutorial
Multi-stage Docker builds are the best way to get small images and fast builds if your application includes compiled code. But supporting them in Python is a little tricky: this tutorial will show you how to do it.
Article
If you need to compile some code as part of building the Docker image for your application, you're liable to end up with a huge image. This article will show you, and what you can do about it.
Tutorial
When you’re packaging your Python application in a Docker image, you’ll often use a virtualenv, and use it somehow. The usual method is repetitive and therefore error-prone. There is a simpler way of activating a virtualenv, however, which I’ll demonstrate in this article.
Tutorial
Unlike Python, the lack of memory safety in C and C++ can lead to crashes—and you’ll need to figure out what caused the crash. learn how to prepare in advance for crashes in your test suite, so debugging the cause will be easier.
Library
A low-impact profiler to figure out how much memory each task in Dask is using
Library
Author of Eliot here: if you're tired of just seeing a bunch of unrelated facts in your logs, Eliot can help. Because it shows you causality—this was caused by that—it's a lot more like reading an execution trace than most logging systems. I've used it in a complex scientific batch process to figure out where the calculation went wrong, pinpoint performance problems, and debug small-scale distributed systems.
Article
When your Python application uses a database, tests can be frustrating: they're either unrealistic if you use a fake, or slow to run if you use the real database. But with a little bit of trickery (made easier by Docker) you can have your tests run much much faster and still be just as realistic.
Article
But what if you can’t make your test suite Python faster? There is still something you can do: speed up the feedback loop of your test system. This article covers Python tools and processes that can help you do that.