A laptop with code on it is in front of neon lights
Developer Tools

Optimizing Docker Images With Python

Docker is a powerful tool, and when used with Python, can effectively streamline software delivery. However, it must be used with care and precision. Let’s explore how to best to optimize these images and navigate common stumbling blocks.

Joel Burch

Joel Burch

COO

In software development and deployment, Docker emerged as a paradigm-shifting tool. It offers engineers a lightweight, reproducible, and portable environment for running applications. This powerful tool, particularly when used with Python, has streamlined the process of software delivery, making it more efficient and reliable. However, like any powerful tool, Docker must be used with care and precision. Inefficient use, such as creating large images and the resulting containers, can lead to slower deployment times, increased bandwidth usage, and even potential security vulnerabilities. In this article, we explore Python-based Docker images, providing a guide on how to optimize those images, helping to streamline the software development and deployment process.

If you are familiar with Docker, take the quiz proposed by one of our SRE engineers, Lucy Linder (@derlin).

Unoptimized Python Application and Docker Environment

Docker is a revolutionary tool; it enables developers to package applications into containers—standardized executable components that combine application source code with the operating system (OS) libraries and dependencies required to run that code in any environment. However, when dealing with Python applications, Docker images can often be unoptimized. This can lead to bloated images that consume unnecessary resources. Consider the following example of an unoptimized image containing a Flask application. The application is simple; consisting of a single file, app.py, with several routes that return data when called:

from flask import Flask

app = Flask(__name__)

@app.route('/')
def home():
   return 'Welcome to the Home Page!'

@app.route('/about')
def about():
   return 'About Page - This is a basic Flask app with three routes.'

@app.route('/contact')
def contact():
   return 'Contact Page - You can contact us at info@example.com.'

if name == '__main__':
app.run(debug=True)

 

A possible Dockerfile is as follows:

FROM python:3.11
WORKDIR /app
COPY . /app
RUN pip install -r requirements.txt
CMD ["python", "app.py"]

Another approach:

FROM ubuntu:22.04
RUN apt-get update && apt-get install -y python3 python3-pip
WORKDIR /app
COPY . /app
RUN pip3 install -r requirements.txt
CMD ["python3", "app.py"]

Inefficient Docker Image

While these Dockerfiles will certainly work, they are far from optimized. They use base images that are larger than necessary, and they don't take advantage of Docker's layer caching, which can speed up build times and reduce the size of the final image. Large images will increase build times, as well as potentially increase network transit costs if the image repository is hosted in a cloud platform. Additionally, all of the files in  the build directory are being copied over, resulting in larger images (and potential security issues if sensitive data is exposed).

Unoptimized Performance Benchmarks

To benchmark the Docker images, three basic metrics can be used: 

  • Image size 

  • Build time

  • Runtime performance 

Runtime performance of the application may not be perceptibly affected as it’s a simplistic environment with minimal network latency (vs. a distributed cloud environment). However, runtime performance costs of a heavier image may not be trivial. It should also be noted that Docker will often intelligently apply caching for repeated builds, which will significantly shorten build times on development machines. However, in build environments like CI runners, there is almost never an existing image cache available. To simulate this, docker build will be run with the -–no-cache option.

Methodology

For the sake of simplicity, this article will focus on the first two metrics, as they are most relevant to image optimization. The testing methodology is as follows:

  • System: x86_64 Macbook Pro

  • Environment: Fresh Docker Desktop install, all cached images, containers, and layers are purged prior to the build with docker system prune

  • Data Collection: Build time is collected using the time unix program. Image size is collected from the Docker environment using the command docker images

Results - Python Base

  • Image size: 1.09GB

  • Build time: 50.276s

Results - Ubuntu Base

  • Image size: 541MB

  • Build time: 1:11.7m

Although the Ubuntu base image actually came in at nearly half the image size, it took ~20 more seconds to complete the build, likely owing to the Dockerfile command to update the base OS. In either case there are definitely some optimization issues.

Optimizing Python Docker Images

There are a variety of strategies that can be employed to help optimize Docker images. Remember that the primary goal of optimizing an image is to reduce its size while minimally compromising on performance or security.

Using a Smaller Base Image

The base image is the foundation upon which a Docker image is built. By choosing a smaller base image, users can significantly reduce the size of the final Docker image. For Python applications, the official Python Docker image is a common choice. However, these images can be quite large. An alternative is to use the Alpine-based Python image, which is significantly smaller:

FROM python:3.11-alpine

Using Multi-Stage Builds

Multi-stage builds allow the usage of multiple FROM statements in a Dockerfile. Each FROM statement can use a different base image, and only the final stage will be kept. This is particularly useful when an application requires build-time dependencies that are not needed at runtime. Here's an example:

# First stage: build

FROM python:3.11-alpine as builder

COPY . /app

WORKDIR /app

RUN pip install --upgrade pip && pip install -r requirements.txt


# Second stage: runtime

FROM python:3.11-alpine

COPY --from=builder /app /app

WORKDIR /app

Chaining Commands

Each command in a Dockerfile creates a new layer in the Docker image. By chaining commands together using the && operator, the number of layers and overall image size can be reduced: 

RUN apt-get update && apt-get install -y \

    build-essential \
 
   && rm -rf /var/lib/apt/lists/*

Cleaning Up Temporary Files

Temporary files created during the build process can take up a significant amount of space. By removing these files in the same layer they are created, they will be prevented from becoming part of the final Docker image:

   RUN pip install --upgrade pip && pip install -r requirements.txt && rm -rf ~/.cache/pip

Being Selective with ADD and COPY Commands

The ADD and COPY commands can add unnecessary files to a Docker image if not used carefully. Be selective about what is added to the Docker image and consider using a .dockerignore file to exclude unnecessary files and directories. Here's an example of a .dockerignore file:

.git

__pycache__

*.pyc

*.pyo

*.pyd

Leveraging Docker's Build Cache

Docker uses a build cache to speed up image builds by reusing layers from previous builds. By carefully ordering Dockerfile commands, users can take full advantage of the build cache. Commands that change frequently should be placed towards the end of the Dockerfile, while commands that rarely change should be placed at the beginning. Here's an example:

# These commands rarely change, so they are placed at the beginning

FROM python:3.11-alpine

WORKDIR /app

# These commands change frequently, so they are placed at the end

COPY . /app

RUN pip install --upgrade pip && \
    pip install -r requirements.txt && \
    rm -rf ~/.cache/pip

Not all of these strategies are going to be relevant for the basic test cases used for this article. But, as applications and Docker builds grow more complex, they can be employed to achieve meaningful reduction in image size and build time. The next section will look at improvements using some of these strategies for the previous unoptimized examples.

Optimized Performance Benchmarks

The optimized Dockerfile will include some of the strategies described above. Here is the complete file with optimizations:

# Smaller Alpine image
FROM python:3.11-alpine
WORKDIR /app
# Only copying needed files
COPY requirements.txt .
# Multiple commands in a single RUN invocation
RUN pip install --upgrade pip && pip install -r requirements.txt && rm -rf ~/.cache/pip
COPY . /app/app.py
CMD ["python", "app.py"]

A Docker ignore file will also be employed with the same entries described in the previous section.

Results - Optimized Alpine Base

  • Image size: 128MB

  • Build time: 28.436s

The optimized image resulted in a 50% reduction in build time and an image size that is at least 5x smaller! This is a simplistic example, but it highlights the optimizations that can be had with some easy-to-implement configuration changes.

Small Optimizations Lead to Big Wins

The return on investment (ROI) of making builds more efficient multiplies as the number of developers and applications grow. By keeping Docker images lean, software organizations will enjoy faster deployments, save on bandwidth and storage, and reduce potential security risks.

Docker image optimization is not just a good practice — it's a necessity for efficient software development and deployment. By understanding and applying these principles, developers can significantly improve their Docker usage, leading to faster, more efficient, and more secure software delivery.

If you are building Docker images using Github Actions and want to learn some of the best practices, have a look at this article by our SRE Engineer Lucy Linder (@derlin).

Interested to learn more about how Python can be used with the Divio PaaS? Reach out now!

Don't forget to join our LinkedIn and X/Twitter community. Access exclusive insights and be the first to know about our latest blog posts.