One of my recent tasks was to run an upgrade of a production PostgreSQL database. Since we have PostgreSQL running in a Docker container in our Hashicorp Nomad environment, I assumed it would be enough to switch to a new Docker image and everything would be working fine.
That was not the case. When you try to run a new major version of PostgreSQL with the data of an older PostgreSQL version, PostgreSQL will error with "database files are incompatible with server" and not start.
After a quick search on Google, I learned that the data needs to be converted with the
pg_upgrade tool. I was looking for possible options since PostgreSQL is running in a container, and I did not want to install PostgreSQL tools on the server. And after a bit more research, I found the tianon/docker-postgres-upgrade project on GitHub, which is a "POC for using pg_upgrade inside Docker".
I gave it a try locally, and it worked fine, so I proceeded with the work on our Nomad cluster:
- I stopped all applications that access the PostgreSQL instance.
- I stopped the PostgreSQL job in Nomad.
- I made a backup of the "old" PostgreSQL data dir (and additionally, I made a snapshot of the vm).
- I created a separate data directory for the new PostgreSQL server. Thankfully the tianon/postgres-upgrade Docker images are based on the standard PostgreSQL Docker images, so I did not expect any filesystem permission problems.
- I run the following command on the server to export the old data and convert it for the new PostgreSQL version:
docker run --rm -v /db/psql12/data:/var/lib/postgresql/12/data -v /db/psql14/data:/var/lib/postgresql/14/data tianon/postgres-upgrade:12-to-14
- Once the migration was done, I started the PostgreSQL Nomad job with the newer PostgreSQL version.
While the PostgreSQL server was running fine, applications could not connect. After panicking for a few seconds, I realized that I needed to manually copy the
pg_hba.conf file from our old setup over to the new setup. After stopping & starting the new PostgreSQL instance, applications could connect again, and everything worked fine again.