Nomad
Federate access to GCP with Nomad Workload Identity
Nomad Workload Identities uniquely identify each instance of a workload running in a Nomad cluster. Google Cloud Platform Workload Identity Federation grants access to Google Cloud Platform services such as Google Cloud Storage via third party identity providers such as Nomad Workload Identities.
In this tutorial, you will setup Nomad Workload Identity as an identity provider for Google Cloud Platform's Workload Identity Federation. You will test identity federation by running a sample batch job that uploads a file to a private Google Cloud Storage bucket.
Prerequisites
To follow this tutorial, you need:
A Google Cloud Platform Project
A Parent Zone in Google Cloud DNS. Nomad will create a subdomain under this parent domain.
HashiCorp Terraform configured with your GCP authentication method.
Create infrastructure with Terraform
The hashicorp-education/learn-nomad-workload-identity-federation repository contains sample Terraform for creating the Google Cloud Platform infrastructure required by this tutorial.
Clone the repository:
$ git clone https://github.com/hashicorp-education/learn-nomad-workload-identity-federation
Change to the gcp
subdirectory of the repository:
$ cd learn-nomad-workload-identity-federation/gcp
Initialize terraform:
$ terraform init
Initializing the backend...
...
This will initialize the required Terraform providers.
Now, look at the variables.tf
file, which contains all of the input
variables you must provide.
region
- The GCP region to deploy to.zone
- The Google Compute Engine zone to create compute instances in.project
- The GCP project to use.parent_zone_name
- The parent domain for the HTTPS certificate. This must already exist.domain
- The domain for Nomad cluster. This must be a child domain of the preexisting parent zone. Terraform will use the domain to create a TLS certificate. If the parent zone wereexample.com
then this variable could benomad.example.com
.
Tip
In production the domain
is an important choice: it must match the
oidc_issuer
configured on
your Nomad Server agents.
Create a file called tutorial.auto.tfvars
and paste in the following code,
which defines the Terraform variable values. Replace the values with your own.
project = "example-project"
region = "us-central1"
zone = "us-central1-a"
parent_zone_name = "example-parent-zone"
domain = "nomad.example.com"
Then use Terraform apply to create the infrastructure.
$ terraform apply
data.google_project.main: Reading...
data.google_dns_managed_zone.parent: Reading...
...
Plan: 23 to add, 0 to change, 0 to destroy.
Do you want to perform these actions?
Terraform will perform the actions described above.
Only 'yes' will be accepted to approve.
Enter a value:
Review the infrastructure plan and respond yes
to
apply it. Once Terraform creates the infrastructure, it will template the domain
into a Nomad Agent
configuration file named agent.hcl
and provide other outputs you will use
in the next step.
Run Nomad agent
Terraform created a Google Compute Engine (GCE) instance and installed the Nomad binary on it. For this tutorial you will manually run the Nomad agent as a single node cluster. See Set up a Nomad cluster on GCP for instructions on deploying a production Nomad cluster to GCP.
Open a terminal on the GCE instance through GCP's Console or gcloud
CLI. Copy the
agent.hcl
file that Terraform created onto the GCE instance. If you are
using the gcloud
CLI, Terraform's output will include helpful configuration
commands:
...
Apply complete! Resources: 23 added, 0 changed, 0 destroyed.
Outputs:
gcloud_config = <<EOT
gcloud config set project example-project
gcloud config set compute/zone us-central1-a
EOT
gcp_project_num = "999999999999"
gcs_bucket = "definite-cardinal-30ba310c496048ec10966722"
oidc_issuer_uri = "https://nomad.example.com/"
random_name = "definite-cardinal"
service_account = "nomad-wid@example-project.iam.gserviceaccount.com"
wid_provider = "projects/999999999999/locations/global/workloadIdentityPools/nomad-pool-definite-cardinal/providers/nomad-provider"
$ gcloud config set project <GCP Project ID>
$ gcloud config set compute/zone <GCE Zone>
$ gcloud compute instances list
NAME ZONE MACHINE_TYPE PREEMPTIBLE INTERNAL_IP EXTERNAL_IP STATUS
vm-1vmv <GCE Zone> n1-standard-1 10.128.0.2 XX.XXX.XX.XXX RUNNING
# Copy agent.hcl to instance
$ gcloud compute scp agent.hcl vm-1vmv:
# SSH into instance
$ gcloud compute ssh vm-1vmv
Once you've started a terminal session on the GCE instance and uploaded agent.hcl
, run the Nomad Agent as root.
$ sudo nomad agent -config agent.hcl
Notice that Nomad warns you that you didn't configure TLS and are running in bootstrap mode. You can ignore these warnings for this tutorial. Nomad will also confirm that the node is registered.
==> WARNING: mTLS is not configured - Nomad is not secure without mTLS!
==> WARNING: Bootstrap mode enabled! Potentially unsafe operation.
==> Loaded configuration from agent.hcl
==> Starting Nomad agent...
...
2024-01-01T17:00:00.000Z [INFO] client: node registration complete
Run NGINX proxy
Nomad's Workload Identities are JSON Web Tokens (JWT) signed by a public key that the Nomad Server Agent created. Google must get that public key from Nomad to confirm that a JWT is a valid Nomad Workload Identity.
Nomad exposes public keys as a RFC 7517
JWKS HTTP endpoint:
/.well-known/jwks.json
.
Nomad's HTTP API is for local network access only and not intended to be
directly exposed to the Internet.
However Google requires the JWKS endpoint to be secured with a trusted public
TLS Certificate, so the endpoint must also be available on a public domain. In
this tutorial the domain corresponds to the domain
Terraform variable above
and should be a subdomain of the existing parent zone defined by the
parent_zone_name
Terraform variable.
Terraform configured a load balancer with the proper domain name and certificate, but the load balancer is not configured to route directly to the Nomad Agent. Instead you will follow the best practice of running a proxy between the load balancer and Nomad. For this tutorial you will use NGINX. While a proxy is not strictly necessary, an NGINX load balancer will let you:
- only expose the
/.well-known/jwks.json
endpoint. - allow unauthenticated access to
/.well-known/jwks.json
if ACLs are enabled. - use the Task API to avoid having to handle Nomad mTLS certificates if mTLS is enabled.
- expose Nomad's UI through an NGINX proxy as well. See the Configure NGINX reverse proxy for Nomad's web UI tutorial for details.
In a new terminal, copy the proxy.nomad.hcl
file to the GCE instance and run
the NGINX proxy. To upload the job with gcloud
in the gcp/
subdirectory of
the repo run:
$ gcloud compute scp ../proxy.nomad.hcl vm-1vmv:
proxy.nomad.hcl 100%...
Once uploaded, open a terminal on the instance to run the job:
$ gcloud compute ssh vm-1vmv
Run the job on the instance:
$ nomad job run proxy.nomad.hcl
==> 2024-01-29T19:24:55Z: Monitoring evaluation "53e0fedf"
2024-01-29T19:24:55Z: Evaluation triggered by job "nginx-proxy"
2024-01-29T19:24:56Z: Evaluation within deployment: "04900289"
2024-01-29T19:24:56Z: Evaluation status changed: "pending" -> "complete"
==> 2024-01-29T19:24:56Z: Evaluation "53e0fedf" finished with status "complete"
==> 2024-01-29T19:24:56Z: Monitoring deployment "04900289"
✓ Deployment "04900289" successful
2024-01-29T19:24:56Z
ID = 04900289
Job ID = nginx-proxy
Job Version = 0
Status = successful
Description = Deployment completed successfully
Deployed
Task Group Desired Placed Healthy Unhealthy Progress Deadline
nginx 1 1 1 0 2024-01-29T19:12:40Z
Google Federated Workload Identity can now read Nomad's JWKS endpoint to
validate Nomad Workload Identity JWTs. You can inspect the JWKS endpoint from
your browser or with curl
and jq
in a terminal:
$ curl -s https://nomad.example.com/.well-known/jwks.json | jq .
{
"keys": [
{
"use": "sig",
"kty": "RSA",
"kid": "d164162e-6af4-80c9-bdd5-44e02de28d4c",
"alg": "RS256",
"n": "6boD27xwbHS23EV9lbGQI3wj5tJqYk2znEeMEDlARmHSxjX5gV1A71f5b_GAazy-ur7-UGS5l2YnEm4-tp2ZORCYi9t9_VcHPtuly20tfvgk2puH5HVkKSyBx1YtNbSiMCdcUwpMs4dBfjEpTUjWIkkszqV9FTQs6bcoeA-SAHaeLNPui5RKDHo0aeKacG7GIHb5Z1YfUOPoPK2ZCly_rnKXSuW7tM_4mNeNJaW8TOc5R38O9lOyWGRKmySVoj0I3o37yLevFNwF9_rebftOwA6Li583lZ3qszEFakqnqd8vljF4MLPUD8XXiWxp9L0p4riuPcy3TUM0B-51wnZrvw",
"e": "AQAB"
}
]
}
Test the federation
To test that Google Federated Workload Identity accepts Nomad Workload
Identities, you will upload a test file to a GCS bucket created by Terraform.
The Nomad job uses variables to specify parameters such as the GCP project and
GCS bucket. Terraform created a Nomad variable file gcs.nomadvars.hcl
to
set the variables for the infrastructure created in the tutorial.
Upload the gcs.nomadvars.hcl
and gcs.nomad.hcl
files to the GCE instance
and run the GCS job with Nomad:
$ gcloud compute scp gcs.nomad*hcl vm-1vmv:
gcs.nomad.hcl 100%...
gcs.nomadvars.hcl 100%...
Once uploaded, open a terminal on the instance to run the job:
$ gcloud compute ssh vm-1vmv
Run the job with the variable file:
$ nomad job run -var-file gcs.nomadvars.hcl gcs.nomad.hcl
==> 2024-01-01T12:34:56Z: Monitoring evaluation "36788aae"
2024-01-01T12:34:56Z: Evaluation triggered by job "gcs-job"
2024-01-01T12:34:56Z: Allocation "457e3a87" created: node "2e230f89", group "gcs-group"
2024-01-01T12:34:56Z: Evaluation status changed: "pending" -> "complete"
==> 2024-01-01T12:34:56Z: Evaluation "36788aae" finished with status "complete"
The job uses a large Google Cloud SDK image which may take a couple minutes to download. You may use Nomad to check the status of the workload:
$ nomad alloc status $(nomad job allocs -t '{{with index . 0}}{{.ID}}{{end}}' gcs-job)
...
Recent Events:
Time Type Description
2024-01-01T12:36:04Z Terminated Exit Code: 0
2024-01-01T12:36:03Z Started Task started by client
2024-01-01T12:34:58Z Driver Downloading image
2024-01-01T12:34:57Z Task Setup Building Task Directory
2024-01-01T12:34:56Z Received Task received by client
In a few seconds the job creates a test.txt
file in the GCS bucket that Terraform created. Read the file with the
gcloud
CLI.
$ gcloud storage cat gs://<GCS Bucket from Terraform output>/test.txt
Job: gcs-job
Alloc: 457e3a87-3e0d-5eea-afff-36009cc8ae15
Project: 000000000000
Bucket: <GCS Bucket from Terraform output>
WID Provider: projects/000000000000/locations/global/workloadIdentityPools/nomad-pool-<random name>/providers/nomad-provider
Service Acct: nomad-wid@<GCP Project>.iam.gserviceaccount.com
Open the gcs.nomad.hcl
file and notice how the job uses its workload
identity to authenticate with Google and upload to GCS. If you make changes to
the template
and re-run the job, your changes should be reflected in the
test.txt
file in GCS.
When you are done exploring run terraform destroy
to cleanup all
of the created infrastructure.
$ terraform destroy
random_pet.main: Refreshing state... [id=definite-cardinal]
random_id.bucket_suffix: Refreshing state... [id=ILoxDElgSOwQlmci]
local_file.agenthcl: Refreshing state... [id=7d3e2586de38582b784c6566af6be2569106f894]
data.google_project.main: Reading...
...
Do you really want to destroy all resources?
Terraform will destroy all your managed infrastructure, as shown above.
There is no undo. Only 'yes' will be accepted to confirm.
Enter a value: yes
...
google_compute_instance_template.nomad: Destruction complete after 11s
Destroy complete! Resources: 23 destroyed.
Next Steps
In this tutorial, you configured Nomad as a federated identity provider for Google Cloud Platform.
The following resources are recommended for learning more:
Google Cloud's Workload Identity Federation documentation covers all of the workflows and infrastructure supported by Nomad. In Google's documentation "Workload identity pool provider" refers to the role Nomad fulfills.
Nomad's Workload Identity documentation covers how the details of Nomad's federated identity support for workloads.
Nomad's Cluster Setup and Transport Security tutorials cover how to set up production clusters. This tutorial only used a single VM to run Nomad and the sample workload.