Sharing Secrets with Nomad Workload Identity
A while ago, Nomad introduced the concept of Workload Identity to authenticate to Vault and obtain a Vault ACL token specific to the task. This way, Nomad jobs only have access to "their" own secrets, which provides an additional layer of security.
Making sure Nomad jobs can only access their own secrets is generally a "good thing"(tm), there are situations where you want have to share secrets among some Nomad jobs.
In my example, I have two batch jobs that need access to the same secrets stored in the same path. Since I couldn't batch two jobs in one job file, I needed to solve this differently. While reading the Nomad docs, I realized that I can configure a role in the vault block job in my job file:
job "job1.batch" {
datacenters = ["dc1"]
type = "batch"
vault {
role = "my-job-role"
policies = ["default"]
change_mode = "signal"
change_signal = "SIGUSR1"
}
periodic {
cron = "0 1 * * *"
prohibit_overlap = true
}
group "job1-group" {
count = 1
task "job1-task" {
// ...
}
}
}
That seems exactly what I need. So I set up an additional job role in Vault with the same role configuration as our default role:
{
"role_type": "jwt",
"bound_audiences": "vault.io",
"user_claim": "/nomad_job_id",
"claim_mappings": {
"nomad_job_id": "nomad_job_id",
"nomad_namespace": "nomad_namespace",
"nomad_task": "nomad_task"
},
"user_claim_json_pointer": true,
"policies": ["my-job-role-policy"],
"ttl": "0s",
"token_period": "30m",
"token_type": "service"
}
The only difference compared to our default role used by Nomad is the use of a different policy in Vault. The my-job-role-policy
policy is configured as follows:
path "nomad/data/job1.batch/*" {
capabilities = ["read"]
}
path "nomad/data/job1.batch" {
capabilities = ["read"]
}
path "nomad/metadata/*/*" {
capabilities = ["list"]
}
path "nomad/metadata/*" {
capabilities = ["list"]
}
Now, in my Nomad job2.batch
job, I can reference the same my-vault-role
, which is using the same my-job-role-policy
for accessing the secrets in Vault:
job "job2.batch" {
datacenters = ["dc1"]
type = "batch"
vault {
role = "my-job-role"
policies = ["default"]
change_mode = "signal"
change_signal = "SIGUSR1"
}
periodic {
cron = "0 2 * * *"
prohibit_overlap = true
}
group "job2-group" {
count = 1
task "job2-task" {
// ...
}
}
}
This is a way better alternative than modifying our default policy in Vault and giving all our Nomad jobs access to the nomad/data/job1.batch
secrets or duplicating the secrets in Vault.
It took me a bit of time to understand how all the parts play together. Once I got the full picture, setting everything up was quite easy and has been working fine for us for a few months now.
If you're interested in learning more about the inner workings of Nomad Workload Identities, I recommend checking out this blog post for a deeper dive.