Why Nomad?
Hashicorp Nomad is a simple and flexible scheduler and orchestrator that helps organizations reduce operational overhead and maximize infrastructure usage. We've been using Nomad to run our internal workloads since about 2018.
Occasionally, people ask me why we've picked Nomad over Kubernetes or similar tools. Back when we planned to migrate our mix of virtual machines and dockerized setups to a new infrastructure, we experimented with several solutions like Mesos or Kubernetes.
Since IONOS Cloud did not have a Managed Kubernetes offering back then, and we did not want to switch to a different hosting provider, we experimented with hosting Mesos and Kubernetes ourselves. It quickly became clear that from an operational perspective, it's too much work for our small IT team to maintain both products.
Looking for alternatives, we came across Nomad by accident. We installed Nomad and had our first workload running in less than one day. A pure Nomad setup is enough for very simple and static workloads. However, it became clear that we needed a bit more, so we installed Consul for search discovery and Vault for secrets management alongside Nomad. Since we wanted to dynamically place workloads on all of our cluster nodes, the last piece of the puzzle was a distributed filesystem like GlusterFS.
That setup ran until early 2024, when we switched to our new Nomad cluster. While the initial setup was running fine in general, in detail, we had some issues that needed some optimizations.
Every now and then, one Consul node loses its connection to the other nodes, which results in Nomad rescheduling the workloads on that specific node. Technically, this is not a big deal, but some of our workloads (e.g., GitLab) take 5 - 10 minutes to boot up. Having multiple restarts during a working day can be super annoying, especially when you just want to get your work done. Thankfully, Nomad 1.3 introduced native service discovery, which is what we are using today. It is clearly not as powerful as Consul but very much fits our requirements, and since we made the switch, those regular restarts are gone.
While GlusterFS sounded like a good choice in the first place, we quickly ran into some issues with small files that it couldn't handle properly. That meant we had to bind quite a few of our workloads to specific machines. It "worked" for us but was not the ideal scenario we hoped for.
Luckily, since version 0.11, Nomad supports the CSI specification. While it sounds like we can easily plug any CSI-compatible solution into Nomad, the reality is that this is not the case, as most CSI solutions are very specifically built on top of and for Kubernetes and its way of working.
Last year, we tested different options and fell in love with SeaweedFS as a distributed storage system. It not only works perfectly fine with Nomad but is also very easy to administer. Like Nomad, SeaweedFS is just a single binary that must be distributed on the different servers and configured accordingly.
We've got our new Nomad cluster working for a few months now and are super happy with the upgrade and the new capabilities it offers us.