Docker basics: multi-stage docker build

Often time when we list all of our docker images in our local machine, we often find out that the size of our docker image is oftentimes so large that makes us often wonder just how much of that is what we really need in order to run our app or service.

In the last blog post, we've learnt that often time, the size of our image is not the same as the physical size of what was shown. Because Docker has this feature called image layer caching, that makes our docker images sharing layer between one another.

So why do we need multi-stage build ? Multi-stage build feature in Dockerfiles enables you to create smaller container images with better caching, performance, and many other things. Most of the time, we often found another project that uses multi-stage docker build for any reason. Let's see if we can optimize your Docker image using the multi-stage docker build.

Prerequisite

I just decided to use the express-generator app (I called it express-app when creating the docker image in the past if I'm not wrong) that we've built on the previous blog post in the past.

If you haven't followed my previous blog post, you can just create your own app using express-generator. It's a node.js app, so it requires node.js on your local machine. After that, just add a Dockerfile (without any extension) on the root folder.

Checking our express-generator app

Our express-generator app already has a Dockerfile inside it. So, let's review the Dockerfile below:

FROM node:alpine3.10

WORKDIR /APP

COPY . /APP

RUN npm install

CMD ["npm", "start"]

As you can see, it's still a single-stage build Dockerfile. How do we know if a Dockerfile is a single-stage build or multi-stage build ? The main difference is that the multi-stage build has multiple FROM command. Whereas single-stage build only have one FROM command inside it, like our Dockerfile above.

It's a simple Dockerfile that works from node:alpine3.10 (node 15.8.0), creating /APP* directory and make it our working directory. Copying everything on our main app to that working directory. Install every package using npm install command. And lastly, set our starting command as npm start. it's a simple Dockerfile but it works for our needs. Now let's see if we can change it into multi-stage build Dockerfile.

Using multi-stage docker build

Now, let's revised our Dockerfile using multi-stage docker build like this:

FROM node:alpine3.10 as stage-1

WORKDIR /app
COPY package.json .
RUN npm install

FROM node:alpine3.10

WORKDIR /web
EXPOSE 3000
CMD ["npm", "start"]

COPY --from=stage-1 /app/node_modules /web/node_modules
COPY . /web

The Dockerfile above is an example of multi-stage Dockerfile. We can see that we're using more than one FROM commands there. Why we're using two FROM commands there ? Becase we want to optimize the build time.

The first FROM command, we can see that we give it an alias as stage-1. What does stage-1 do ? stage-1 here is only copying package.json file to the working directory and installing all the package, generating node_modules folder.

The second stage or second FROM command, we just have to create our working directory, copy all the app files there, copying the node_modules into our working directory, and giving a start command. The EXPOSE command does not actually publish the port. It functions as a type of documentation between the person who builds the image and the person who runs the container, about which ports are intended to be published.

When you build the image again, you won't see the image size getting smaller, but what we're doing here is, we're optimizing the image build time. Using the order or sequence of commands that we're using, we avoid re-executing command. For example, when we change our code file, like adding a new line in our app.js, our Dockerfile will only re-execute one line, that is the last line command COPY . /web. Because, as we're developing, we're prone to only change our main app file, not something else. And when we install a new package, Dockerfile will only execute three commands, that is the 3, 4, and 10 commands, and using cache for the other command. At least just by doing this, we've optimized our image build time.

Conclusion

There's so much advantage for using multi-stage docker build. For most cases, it can even optimize our docker image size for goods. Other than that, multi-stage docker build also can make standardization for our app and make our image as lean as possible. There's much more to learn about multi-stage docker build. But for the basic, this is already enough