Contributions

Article
The official Python Docker image is useful, but to actually understand why, and to use it correctly, it’s worth understanding how exactly it’s constructed.
Article
There are a variety of ways of packaging your Python application for distribution, from wheels to Docker to PEX to Conda, and more. This article gives a survey of the different approaches, specifically focusing on distributing internal server applications.
Article
When your server is leaking memory, the Fil memory profiler can help you spot the buggy code.
Tutorial
Objects in Python have large memory overhead; create too many objects, and you’ll use far more memory than you expect. Learn why, and what do about it.
Article
Storing integers or floats in Python has a huge overhead in memory. Learn why, and how NumPy makes things better.
Tutorial
There are many reasons your code might fail to import in Docker. Here's a quick series of checks you can do to figure out the problem.
Article
Python will automatically release memory for objects that aren't being used. But sometimes function calls can unexpectedly keep objects in memory. Learn about Python memory management, how it interacts with function calls, and what you can do about it.
Tutorial
Debugging out-of-memory crashes can be tricky. Learn how the Fil memory profiler can help you find where your memory use is happening.
Article
Fil is a new memory profiler which shows you peak memory usage, and where that memory was allocated. It’s designed specifically for the needs of data scientists and scientists running data processing pipelines.
Tutorial
You don’t want to deploy insecure code to production—but it’s easy for mistakes and vulnerabilities to slip through. So you want some way to catch security issues automatically, without having to think about it.

For your code to your Python dependencies to the system packages you depend on, learn about some tools that will help you catch security vulnerabilities.
Tutorial
If your Docker build isn’t reproducible, a minor bug fix can spiral out of control into a series of unwanted and unnecessary major version upgrades.

There are multiple layers of reproducibility, from operating system to Python dependencies, so this article will cover how to deal with each.
Tutorial
Pandas code using too much memory, or running too slow? Processing your data in chunks lets you reduce memory usage, but it can also speed up your code. Because each chunk can be processed independently, you can process them in parallel, utilizing multiple CPUs. For Pandas (and NumPy), Dask is a great way to do this.
Tutorial
You have a large amount of data, and you want to load only part into memory as a Pandas dataframe. CSVs won’t cut it: you need a database, and the easiest way to do that is with SQLite.
Article
If you have a large array that doesn't fit in memory, you'll need to load it from disk—but how? This articles covers two strategies, mmap() and Zarr/HDF5, and the strengths and weaknesses of each.
Tutorial
Sometimes your data file is so large you can’t load it into memory at all, even with compression. So how do you process it quickly?

By loading and then processing the data in chunks, you can load only part of the file into memory at any given time. And that means you can process files that don’t fit in memory.

Learn how you can do this with Pandas.
Article
When you’re choosing a base image for your Docker image, Alpine Linux is often recommended. Using Alpine, you’re told, will make your images smaller and speed up your builds. And if you’re using Go that’s reasonable advice. For Python, Alpine is a bad idea.
Tool
The Conda packaging tool implements environments, that enable different applications to have different libraries installed. So when you’re building a Docker image for a Conda-based application, you’ll need to activate a Conda environment—a surprisingly tricky task.

Read this article to learn how the easy way to do it.
Tutorial
When you're processing large amounts of data in memory, copying wastes limited RAM, but mutating data in-place leads to bugs. There's a third solution, that gives safety while reducing memory usage: interior mutability.
Tutorial
If you’re running into memory issues because your NumPy arrays are too large, one of the basic approaches to reducing memory usage is compression. By changing how you represent your data, you can reduce memory usage and shrink your array’s footprint—often without changing the bulk of your code.
Tutorial
If you want to process a large amount data with Pandas, there are various techniques you can use to reduce memory usage without changing your data. But what if that isn’t enough? What if you still need to reduce memory usage?

Another technique you can try is lossy compression: drop some of your data in a way that doesn’t impact your final results too much. If parts of your data don’t impact your analysis, no need to waste memory keeping extraneous details around.
Article
You’re writing software that processes data, and it works fine when you test it on a small sample file. But when you load the real data, your program crashes. The problem is that you don’t have enough memory—if you have 16GB of RAM, you can’t load a 100GB file.

So how do you process your data with just one computer?

In this article you'll learn the basic techniques to help you process more data that fits in RAM.
Article


Your Python program is too slow. Maybe your web application can’t keep up, or certain queries are taking a long time. Maybe you have a batch program that takes hours or even days to run.

To speed it up, you need a tool that will help you figure where the slowness is coming from. This article will cover some good starting points for choosing the right tool.
Tutorial
If your program is slow, and it's not CPU, how do you pinpoint the problem? In this article you'll learn how to write custom profilers to find places where your program is waiting.
Tutorial
Sometimes your Python process misbehaves; wouldn't it be nice to have an interpreter prompt inside your process to help you debug what's going on? Learn how to do so with Manhole.
Tutorial
It’s tempting to migrate your database schema when your application container starts up—here’s some reasons to rethink that choice.
Tutorial
Installing dependencies separately from your code allows you to take advantage of Docker's layer caching to speed up your build. Here's how to do it with pipenv, poetry, or pip-tools.
Article
When it’s time to package up your Python application into a Docker image, the natural thing to do is search the web for some examples. And a quick search will provide you with plenty of simple, easy examples.

Unfortunately, these simple, easy examples are often broken in a variety of ways, some obvious, some less so.
Tutorial


Gunicorn is a common WSGI server for Python applications, but most Docker images that use it are badly configured. Running in a container isn’t the same as running on a virtual machine or physical server, and there are also Linux-environment differences to take into account.

So to keep your Gunicorn setup healthy and happy, in this article I’ll cover what you how to configure it.
Tutorial
There are many choices for a base Docker image for your Python application—this article will give you some criteria to base your decision on, walk you through some of the options, and make some suggestions that should work for most people.
Tutorial
The more you use JSON, the more likely you are to encounter JSON encoding or decoding as a bottleneck. Python’s built-in library isn’t bad, but there are multiple faster JSON libraries available: how do you choose which one to use? This article will share a process you can use to choose the best library for your needs.

Showing the last 30 only...