Docker multi-stage build explained

Ahmad Al-Sajid
5 min readAug 17, 2024

--

Image created with Haiper.AI

Docker Multi-Stage Build is a feature introduced in Docker 17.05, designed to optimize the size and performance of Docker images. It allows you to use multiple FROM instructions in your Dockerfile, each representing a stage in the build process. The idea is to break down the build process into distinct stages, where you can build your application in one stage and then copy only the necessary artifacts (like binaries, dependencies, etc.) into a final, minimal stage.

How It Works

  1. First Stage: In the first stage, you typically use a full development environment with all the necessary tools and dependencies to build your application.
  2. Intermediate Stages: These stages can be used for additional steps like running tests, linting, or other preprocessing tasks.
  3. Final Stage: In the last stage, you start with a clean, minimal base image (like alpine), and only the artifacts necessary to run your application are copied from the previous stages. This results in a much smaller image that contains only what is necessary to run your application.

Pros of Docker Multi-Stage Build

  1. Reduced Image Size: By copying only the necessary artifacts into the final image, you can drastically reduce the size of your Docker images, which makes them faster to pull, push, and run.
  2. Improved Security: Smaller images with fewer installed dependencies reduce the attack surface, as there are fewer components that could potentially have security vulnerabilities.
  3. Cleaner Dockerfiles: Multi-stage builds allow you to keep your Dockerfile organized by separating different concerns into different stages.
  4. Easier Maintenance: Since the build and runtime environments are separated, it becomes easier to maintain and update your images.
  5. No Need for External Tools: Multi-stage builds allow you to create optimized production images without needing external tools like docker-slim.

Cons of Docker Multi-Stage Build

  1. Increased Build Complexity: While multi-stage builds can clean up your Dockerfile, they can also make it more complex, especially if there are many stages and dependencies between them.
  2. Longer Build Times: If not optimized correctly, multi-stage builds can sometimes lead to longer build times due to the additional stages.
  3. Compatibility Issues: Not all base images or software environments are fully compatible with multi-stage builds, which might require more effort to make them work together.
  4. Less Transparency: When debugging, it might be less clear where things are going wrong since the final image does not include all the build tools and dependencies that were used in earlier stages.
  5. Resource Usage: Multi-stage builds can consume more resources during the build process, especially in environments with limited resources.

Overall, Docker Multi-Stage Builds are a powerful tool for creating lean, secure, and maintainable Docker images, especially in a DevOps environment where efficiency and security are paramount.

Ok, we are done with the theorems. Now, let’s have some hands-on experience of the above.

Suppose, We want to Dockerize Hey and use it as a stress-testing tool among our development team. As this repository is not been maintained for a long time, and we want to update the Go version [let’s say it’s a requirement from the team, and I know it’s childish], We will be using our own Dockerfile to build the image.

Let’s build the image and check the image size, also do some testing

$ docker build -t hey-docker .
$ docker image ls hey-docker
REPOSITORY TAG IMAGE ID CREATED SIZE
hey-docker latest a37fc39c6124 10 seconds ago 985MB
$ docker run --rm hey-docker -n 200 -c 50 https://www.apache.org/

Summary:
...
[Truncated for readability]

Pay attention to the SIZE column, the image size is 985MB. Why? The image includes the binary of the application, the dependencies, and every file generated during the building process.

By dividing the build process into two stages, we can achieve a cleaner and smaller-sized docker image. How? In the first stage, the binary is generated from all the dependencies. We only need that binary to execute, not the dependencies or any other file. That’s why, in the second stage, we use a minimal image i.e. Alpine to copy the binary from the first stage and build the final image. Let’s have a look at the updated Dockerfile

Let’s build the docker image again and check the image size

$ docker build -t hey-docker .
$ docker image ls hey-docker
REPOSITORY TAG IMAGE ID CREATED SIZE
hey-docker latest a37fc39c6124 17 seconds ago 18MB
$ docker run --rm hey-docker -n 200 -c 50 https://www.apache.org/

Summary:
...
[Truncated for readability]

voilà! We reduced the image size from 985MB to 18MB. This was pretty straightforward, and we witnessed the Pros of multi-stage docker image building. You must have gained some confidence now. Let’s try to dig some more.

As Hey is not updated for a long time, we found another alternative, Oha which does exactly the same but is written in Rust. Again, for some unknown reason, we want to build the docker image ourselves, not using the Dockerfile provided with the repository. Let’s check out the single-stage Dockerfile.

Build the docker image and check for the size

$ docker build -t oha-docker .
$ docker image ls oha-docker
REPOSITORY TAG IMAGE ID CREATED SIZE
oha-docker latest a4d1bfbad402 35 secondsago 1.96GB
$$ docker run --rm -it oha-docker -n 200 -c 50 https://www.apache.org/
Summary:
Success rate: 100.00%
...
[Truncated for readability]

We don’t want to keep it that large. Let’s try to reduce that with Alpine image as the deployment image.

Let’s see the result

$ docker build -t oha-docker .
$ docker image ls oha-docker
REPOSITORY TAG IMAGE ID CREATED SIZE
oha-docker latest 3048e3221d1b 3 seconds ago 20MB

We reduced the size from 1.96GB to 20MB. We are happy to reduce the image size. Now, let’s give it a try

$  docker run --rm -it oha-docker -n 200 -c 50 https://www.apache.org/
exec /bin/oha: no such file or directory

Wait, it was working on the previous build, and now it’s not!

According to this blog post, it happens because the Rust binary that we’ve built is dynamically linked against libc, and it’s missing from shared libraries inside the alpine image. Alpine Linux is using musl libc instead of default libc library.

We have two options:

  • Build the Rust binary with the x86_64-unknown-linux-musl target and link it with musl library
  • Use distroless images from Google

Let’s update the Dockerfile with a distroless image as above, and check again

$ docker build -t oha-docker .
$ docker image ls oha-docker
REPOSITORY TAG IMAGE ID CREATED SIZE
oha-docker latest 7761ee18b5c8 12 seconds ago 35.6MB
$ docker run --rm -it oha-docker -n 200 -c 50 https://www.apache.org/
Summary:
Success rate: 99.00%
...
[Truncated for readability]

The final image is 35.6MB, whereas the apline one was 20MB that didn’t work. This is not a big difference in size, and we can happily work with that.

So, to conclude, always using the minimal image i.e. Alpine may not be a good fit. It can cause a mess, and you will have to spend time on debugging what went wrong instead of focusing on your main job [in this case, stress-testing your application]. The above-mentioned examples are sufficient enough to understand the multi-stage docker image building, and its pros and cons.

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

Ahmad Al-Sajid
Ahmad Al-Sajid

Written by Ahmad Al-Sajid

Software Engineer, DevOps, Foodie, Biker

No responses yet

Write a response