Skip to main content

Nomad TLS error in CI pipeline

· 2 min read
Stephan Hochdörfer
Head of IT Business Operations

In the process of migrating our Hashicorp Nomad workload to our new Nomad cluster, I also tried to simplify our CI pipelines and ran into an issue with Nomad.

In our previous CI setup, we've been using Terraform to communicate with Nomad and to run jobs (aka trigger deployments). While the setup worked fine, I always felt we don't need Terraform as it "just" invokes Nomad and overcomplicates things a bit too much for my taste.

In our GitLab CI pipelines, I removed the hashicorp/terraform:light Docker image and replaced it with hashicorp/nomad:1.7.5. Plus, instead of invoking the Terraform CLI command, I switched to these Nomad commands:

nomad job validate -var="tag=${CI_COMMIT_TAG}" nomad/my-job.nomad
nomad job run -var="tag=${CI_COMMIT_TAG}" nomad/my-job.nomad

Once I was happy with my work, I decided to run the CI pipeline to see if everything was working. Apparently, it wasn't. The build broke with this error:

tls: failed to verify certificate: x509: certificate signed by unknown authority

At first, I was confused. We have Traefik running in front of our cluster. Did Traefik generate a wrong SSL certificate? Everything seems to be working fine in my browser. And then it occurred to me. Maybe the ca-certificates package is not installed in the Docker image. And I was right. The ca certificates were missing.

I decided not to build my own Docker image for Nomad but instead go the lazy way and mount the /etc/ssl directory from the host into the containers bootstrapped by GitLab CI. This appeared to be the easiest way to fix the issue. While things seemed to work at first, a day later, we had CI jobs failing. Specifically, CI jobs that do not use a Debian-based Docker image, like Alpine.

Since I haven't had the time to analyze things, I decided to build a custom Docker image for now and check later if the issue can be solved otherwise. The Dockerfile for the custom container looks like this:

FROM debian:bookworm-slim

RUN apt update && apt upgrade -y && apt install -y wget gpg coreutils ca-certificates

RUN wget -O- https://apt.releases.hashicorp.com/gpg | gpg --dearmor | tee /usr/share/keyrings/hashicorp-archive-keyring.gpg

RUN echo "deb [signed-by=/usr/share/keyrings/hashicorp-archive-keyring.gpg] https://apt.releases.hashicorp.com bookworm main" | tee /etc/apt/sources.list.d/hashicorp.list

RUN apt update && apt install nomad

Once the CI Pipeline jobs were using the self-built Docker image, everything went back to normal and worked again.