Should We Run Database Seed and Migrations In Dockerfile?

Apr 22, 2020

A common question

docker-compose is a popular approach to manage development environments. A common question from docker newbies is: What steps should we put in Dockerfile?

More specifically, when working with full-stack frameworks that manage database migrations, e.g., Rails or Elixir/Phoenix, it is tempting to add rails db:migrate or mix ecto.migrate step in the dockerfile. The goal is to make docker-compose up the ONLY thing we need for setting up environment.

However, this approach is problematic. Let me explain why.

One-off tasks are special

Imagine we don’t have docker, and we are setting up our local environment. To setup database, we usually need the following steps:

Create database
Run migrations
Seed database

If you look carefully, these are usually one-off tasks. Database create can fail if you run it the second time. Same as seeding, usually you intentionally design your seeding script to run idempotently.

The exception is running database migrations, because frameworks like Rails or Ecto assign timestamps to migrations, and running the migration command will check the existing state, and avoid repeating a migration the second time.

However, even for migrations, I argue that we should avoid running it in Dockerfile. This is particularly for dockerized production environments. Running migrations while building docker image means we can lose control when the migrations will be run. Some database migration could lead to downtime or requires coordination. So running it out of band may be the best option.

For these one-off tasks, they are better documented in README and run them outside of Dockerfile or docker-compose.yml

The other idea I found is to treat them services, add them in docker-compose.yml, so developers run it ONCE, with commands like docker-compose up -d db_setup. But the trade-off here is we can’t do docker-compose up all the time because it runs one-off tasks, too. And this has to be documented somewhere, too.

Use a platform repo to manage shared services

It is common to see teams having one monolithic app and multiple other repos for microservices. We often use a separate a platform repo where there is a long docker-compose.yml that maps to each small repo. The purpose is to avoid having a different docker-compose.yml in each separate repos.

I’ve also seen a docker-compose.yml in a separate repo where we mock the dependent service. This can be helpful if the dependent service is simple to mock, and has no or little state to keep.

docker-compose `host` network mode?

Once I saw a fellow co-worker uses host network mode, which lets the docker container access the host services. The intension was to share services with the host and spin less docker serives.

However, this is against the ideal of docker. The whole point is to run isolated environment. With host network mode, it introduces the dependencies on the host environment.

Even in the case you have to host network, we can consider using the special DNS name: host.docker.internal.

From my experience, using docker-compose default network mode, which spins a single network, is often the way we prefer.

Concluding thoughts

When using docker-compose to manage development environments, it is helpful to distinguish the tasks which are safe to run on every docker build, and those one-off tasks.

Also, I discuss other aspects of docker-compose, e.g., running a platform repo to spin up shared services and why we should stick to docker-compose default network mode.

Thank you for read!