Docker basics: health check and dependency check to support reliability

Learn how to support reliability on our app

If we run our app inside a container and it's suddenly crash, then docker will stop that container and exit it, the container will then get into exited state. This is a very basic health check that we usually do with docker. If the container gets into exited state then we know that there's something wrong with our container.

And if we're running multi-container in a clustered environment, then we can leave it to our orchestration tools to restart a container if it ever goes to the exited state. But it's a very basic and simple health check. What if our container is not exited but always giving us 500 internal server error ? Docker will not change the state to exited and keep it running even though there's sure is something wrong with the app.

Start using HEALTHCHECK inside docker image

For the sake of example, let's create a really simple API server using node js:

const http = require('http');

const messages = ["You are great!", "You can accomplish anything!", "Success is in your future!"]
let counter = 0

const requestListener = function (req, res) {
    res.setHeader("Content-Type", "application/json");
    switch(req.url) {
        case "/":
            counter += 1
            if (counter <= 3) {
                res.writeHead(200);
                const index = Math.floor(Math.random() * messages.length + 1)
                const message = messages[index - 1]
                res.end(JSON.stringify({ message: message, counter: counter }));
            } else {
                res.writeHead(500);
                res.end(JSON.stringify({ error: "Internal Server Error" }));
            }
            break
        case "/health":
            if (counter <= 3) {
                res.writeHead(200);
                res.end(JSON.stringify({ message: "healthy" }));
            } else {
                res.writeHead(500);
                res.end(JSON.stringify({ message: "not healthy" }));
            }
            break
        default:
            res.writeHead(404);
            res.end(JSON.stringify({error:"Resource not found"}));
    }
}

const server = http.createServer(requestListener);
server.listen(8080, () => {
    console.log("server is running on port 8080")
});

It's an API server that'll return us a random quote every time we hit the / endpoint with status code 200. But the thing is, the server will start giving us 500 internal server error after we hit the endpoint three times. In a real-world situation, maybe the server is overload and giving back 503 error, but for the sake of example, we'll keep it simple.

Now, let's create the Dockerfile for this API server:

FROM node:14-alpine
WORKDIR /app
EXPOSE 8080
COPY . .
RUN npm install
CMD ["node", "server.js"]

And run the command docker build -t quote-api . to build the image and run docker run -d -p 8080:8080 quote-api to run the image that we've just built on port 8080. Now, we can access our API through localhost:8080 on our browser.

Please access localhost:8080 in the browser and refresh it 3 times. We can see that from the third time onwards our service will return internal server error response. It's a bug which I deliberately create.

But if we check our docker container, its status is still up and running. The app is still running as far as docker concerned even though it's behaving incorrectly because the docker runtime has no knowledge of what is happening inside the container process.

This is where we use HEALTHCHECK command in our Dockerfile. For now, let's create a new Dockerfile called Dockerfile.v2:

FROM node:14-alpine

WORKDIR /app
COPY . .

RUN apk --no-cache add curl

RUN npm install

ENTRYPOINT ["node", "server.js"]

HEALTHCHECK CMD curl --fail http://localhost:8080/health

There's a bit difference between this Dockerfile and the previous one. One biggest difference is, there's a command to install curl. Because we need to install curl in order to do healthcheck and the node:14-alpine image didn't come with curl so we need to install it.

And lastly, there's the HEALTHCHECK command to check if our app is working as it should be or not. We're using --fail flag in order to check the response status code. If status success, then it'll return number 0 and Docker will read is as successful. But if it's returning other than 0 means that health check failed. Docker will usually wait three times with interval of 30 seconds each to check our app health just in case that it's only a temporary error. If after three checks it's still return an error, then docker will mark our app as UNHEALTHY.

As we can see too, that we have a /health route in order to check our health. We won't be using the main service for that, so we'll be using a different route for that. If we see our code, we'll see that the /health path will not do anything except check our app status and return it.

So now if we run Dockerfile.v2 using the command:

docker build -t quote-api:v2 -f Dockerfile.v2 .

docker run -d -p 8081:8080 quote-api:v2

We now can access our app in the localhost:8081 on our browser. But let's check our container status first using the command docker ps. This will give us our container high-level status. And we can see that our new container have status health: starting. This means that docker is still checking our container status, so we should wait 90 seconds for now. After 90 seconds, we can check again our container status and it should be giving us healthy status.

Now what if we hit our server more than three times and the server is returning 500 internal server error again ? well, let's do that:

curl HTTP://localhost:8081
curl HTTP://localhost:8081
curl HTTP://localhost:8081
curl HTTP://localhost:8081

Hitting our server 4 times should make the server returning an error. Now docker will trigger the health check again and after failed three times, our container status will change to unhealthy. But the container status will still be running because docker won't remove an unhealthy container.

Why wouldn't docker remove an unhealthy container ? that's because there'll downtime while the app is being restarted. Or maybe we write data inside the container and that means our data will be lost. Docker just don't know what to do with it. Restarting our container may make things worse. So docker will just leave it running with status unhealthy and maybe the status is temporary, and when the next check running it will be healthy again.

Dependency check on Docker

The thing about dependency check in docker is, the fact that docker doesn't have a command like HEALTHCHECK for doing dependency check. It leads us to create a logic in our Dockerfile/startup command to check for app dependency right before running the app itself.

Because some app might need a dependency in order to run as it should be. Something that needs to run before the app itself runs. And maybe some apps have some kind of logic build behind them In order to check for dependency, but most of the app don't.

In order to learn about this dependency checking, let's make a simple web app using node, express, axios, and ejs as the template engine:

require('dotenv').config()

const express = require('express')
const axios = require('axios')
const app = express()
const port = 8000

app.set('view engine', 'ejs')

app.get('/', (req, res) => {
    const url = process.env.API || 'http://localhost:8080'
    axios.get(url)
        .then(response => {
            const quote = response.data
            res.render('index', {
                'quote': quote
            })
        })
        .catch(error => {
            console.log(error)
            res.send('internal server error')
        }) 
})

app.listen(port, () => {
  console.log(`Example app listening at http://localhost:${port}`)
})

Let's name this file app.js. As we can see, it's a simple web app that only serves one route. Right before serving the route, the app needs to call another service/backend/API whatever we call it, in order to get the data it needed.

This route will call the quote-api service and get a random quote from it. Then, it will direct the data (quote) to the view. So, now let's create a view folder and create a file called index.ejs, after that paste this code inside that:

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta http-equiv="X-UA-Compatible" content="IE=edge">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>quote generator</title>
</head>
<body>
    <div class="root">
        <h1>QUOTE OF THE DAY FOR YOU:</h1>
        <h2>
            <%= quote.message %>
        </h2>
    </div>
</body>
</html>

It's a simple HTML template that'll show a quote message that was given from the server by another service. This pattern is really common in a lot of multi-container apps. Maybe we must access our database first or access other services first. And if one of those things is not available, then our app can't run correctly. This is why dependency check matters.

Now, let's create a Dockerfile for our web app, let's call it quote-web:

FROM node:14-alpine

ENV APP=http://quote-api:8081

WORKDIR /app
COPY . .

RUN apk --no-cache add curl

RUN npm install

CMD curl --fail http://quote-api:8081 && \
    node app.js

So, let's build and run our quote-web app using this two commands:

docker build -t quote-web .

docker run -d -p 8000:8000 quote-web

After run those two commands we may think that we can now access our web app from the localhost:8000 right ? but it's wrong. Because our app is not running at all. Heck, it's not even running the startup command that is node app.js, but why ?

Let's see the last line of our Dockerfile. That's the line that makes our app not running. If the curl is failed, then don't run the command node app.js. That's what that line says, and that's why our app is exited. So the reason why our app is not even running is because our curl command is failed, and that's because our target server http://quote-api:8081 is returning internal server error with status code 500 that's why it has failed.

The self-healing app

There's still so much that we could cover in this tutorial. But I don't want to make this any more technical and too long for a basic guide.

We could've used docker-compose in order to restart our web app, so when our app dependency still hasn't available and our app exited by itself, it'll keep restarting the container until our dependency is available and the app could running as it should be. One thing that we should know about docker-compose is, we could customize the healthcheck setting. Like, how many intervals do we want ? how many retries ? timeout and start period ?

The other thing that we should know is that there's other pattern other than the one here. I mean, in this guide, we used curl to do healthcheck and dependency check right ? but sometimes it's not always the right way or the best practice to do it. Why ? because as we can see ourselves, the node-alpine image doesn't have any curl installed and so we have to install it ourselves. It'll make our image heavier and complex. The best practice is using in-app check. Like, let's say we create a new command npm run check so we can check our app health from the inside of the app, not from the outside by using curl.

Or maybe we could also create different service for the sake of checking health ? Let's say we create a check service that by calling that service and giving the URL of the service that we want to check, we can get the response without using curl.

What about depends_on ? it's tempting to just declare the startup order right ? But it's may not be a great idea to do that.

In local machine we can tell that our quote web app needs to depends on quote API to start. But in production where we could be running dozens of server ? We could run 50 servers for our quote-api and 30 servers for our web. If we use orchestration tools like Kubernetes, those tools just want to spin up the server as quickly as they want right ? what if we order our web app to wait for our API server first ? It might take a long time for all of our servers to up and running 100%.

This may be our first step towards the self-healing app. Because the idea of the self-healing app is not app without any bugs, but the app that keep working correctly. And let's say that our app has some error that cause it to run out of memory, the platform will just remove and replace it with a brand new container with fresh memory.