Docker multi-stage build explained

Docker Multi-Stage Build is a feature introduced in Docker 17.05, designed to optimize the size and performance of Docker images. It allows you to use multiple FROM
instructions in your Dockerfile
, each representing a stage in the build process. The idea is to break down the build process into distinct stages, where you can build your application in one stage and then copy only the necessary artifacts (like binaries, dependencies, etc.) into a final, minimal stage.
How It Works
- First Stage: In the first stage, you typically use a full development environment with all the necessary tools and dependencies to build your application.
- Intermediate Stages: These stages can be used for additional steps like running tests, linting, or other preprocessing tasks.
- Final Stage: In the last stage, you start with a clean, minimal base image (like
alpine
), and only the artifacts necessary to run your application are copied from the previous stages. This results in a much smaller image that contains only what is necessary to run your application.
Pros of Docker Multi-Stage Build
- Reduced Image Size: By copying only the necessary artifacts into the final image, you can drastically reduce the size of your Docker images, which makes them faster to pull, push, and run.
- Improved Security: Smaller images with fewer installed dependencies reduce the attack surface, as there are fewer components that could potentially have security vulnerabilities.
- Cleaner Dockerfiles: Multi-stage builds allow you to keep your
Dockerfile
organized by separating different concerns into different stages. - Easier Maintenance: Since the build and runtime environments are separated, it becomes easier to maintain and update your images.
- No Need for External Tools: Multi-stage builds allow you to create optimized production images without needing external tools like
docker-slim
.
Cons of Docker Multi-Stage Build
- Increased Build Complexity: While multi-stage builds can clean up your
Dockerfile
, they can also make it more complex, especially if there are many stages and dependencies between them. - Longer Build Times: If not optimized correctly, multi-stage builds can sometimes lead to longer build times due to the additional stages.
- Compatibility Issues: Not all base images or software environments are fully compatible with multi-stage builds, which might require more effort to make them work together.
- Less Transparency: When debugging, it might be less clear where things are going wrong since the final image does not include all the build tools and dependencies that were used in earlier stages.
- Resource Usage: Multi-stage builds can consume more resources during the build process, especially in environments with limited resources.
Overall, Docker Multi-Stage Builds are a powerful tool for creating lean, secure, and maintainable Docker images, especially in a DevOps environment where efficiency and security are paramount.
Ok, we are done with the theorems. Now, let’s have some hands-on experience of the above.
Suppose, We want to Dockerize Hey and use it as a stress-testing tool among our development team. As this repository is not been maintained for a long time, and we want to update the Go version [let’s say it’s a requirement from the team, and I know it’s childish], We will be using our own Dockerfile
to build the image.
Let’s build the image and check the image size, also do some testing
$ docker build -t hey-docker .
$ docker image ls hey-docker
REPOSITORY TAG IMAGE ID CREATED SIZE
hey-docker latest a37fc39c6124 10 seconds ago 985MB
$ docker run --rm hey-docker -n 200 -c 50 https://www.apache.org/
Summary:
...
[Truncated for readability]
Pay attention to the SIZE
column, the image size is 985MB
. Why? The image includes the binary of the application, the dependencies, and every file generated during the building process.
By dividing the build process into two stages, we can achieve a cleaner and smaller-sized docker image. How? In the first stage, the binary is generated from all the dependencies. We only need that binary to execute, not the dependencies or any other file. That’s why, in the second stage, we use a minimal image i.e. Alpine
to copy the binary from the first stage and build the final image. Let’s have a look at the updated Dockerfile
Let’s build the docker image again and check the image size
$ docker build -t hey-docker .
$ docker image ls hey-docker
REPOSITORY TAG IMAGE ID CREATED SIZE
hey-docker latest a37fc39c6124 17 seconds ago 18MB
$ docker run --rm hey-docker -n 200 -c 50 https://www.apache.org/
Summary:
...
[Truncated for readability]
voilà! We reduced the image size from 985MB
to 18MB
. This was pretty straightforward, and we witnessed the Pros of multi-stage docker image building. You must have gained some confidence now. Let’s try to dig some more.
As Hey is not updated for a long time, we found another alternative, Oha which does exactly the same but is written in Rust. Again, for some unknown reason, we want to build the docker image ourselves, not using the Dockerfile
provided with the repository. Let’s check out the single-stage Dockerfile.
Build the docker image and check for the size
$ docker build -t oha-docker .
$ docker image ls oha-docker
REPOSITORY TAG IMAGE ID CREATED SIZE
oha-docker latest a4d1bfbad402 35 secondsago 1.96GB
$$ docker run --rm -it oha-docker -n 200 -c 50 https://www.apache.org/
Summary:
Success rate: 100.00%
...
[Truncated for readability]
We don’t want to keep it that large. Let’s try to reduce that with Alpine
image as the deployment image.
Let’s see the result
$ docker build -t oha-docker .
$ docker image ls oha-docker
REPOSITORY TAG IMAGE ID CREATED SIZE
oha-docker latest 3048e3221d1b 3 seconds ago 20MB
We reduced the size from 1.96GB
to 20MB
. We are happy to reduce the image size. Now, let’s give it a try
$ docker run --rm -it oha-docker -n 200 -c 50 https://www.apache.org/
exec /bin/oha: no such file or directory
Wait, it was working on the previous build, and now it’s not!
According to this blog post, it happens because the Rust binary that we’ve built is dynamically linked against libc
, and it’s missing from shared libraries inside the alpine
image. Alpine Linux is using musl libc
instead of default libc
library.
We have two options:
- Build the Rust binary with the
x86_64-unknown-linux-musl
target and link it withmusl
library - Use distroless images from Google
Let’s update the Dockerfile
with a distroless image as above, and check again
$ docker build -t oha-docker .
$ docker image ls oha-docker
REPOSITORY TAG IMAGE ID CREATED SIZE
oha-docker latest 7761ee18b5c8 12 seconds ago 35.6MB
$ docker run --rm -it oha-docker -n 200 -c 50 https://www.apache.org/
Summary:
Success rate: 99.00%
...
[Truncated for readability]
The final image is 35.6MB
, whereas the apline
one was 20MB
that didn’t work. This is not a big difference in size, and we can happily work with that.
So, to conclude, always using the minimal image i.e. Alpine
may not be a good fit. It can cause a mess, and you will have to spend time on debugging what went wrong instead of focusing on your main job [in this case, stress-testing your application]. The above-mentioned examples are sufficient enough to understand the multi-stage docker image building, and its pros and cons.