HashiCorp Vault is an API-driven tool for storing and retrieving static and dynamic secrets. Vault can be deployed in a Kubernetes cluster using the official Helm chart. The recommended storage for Vault in Kubernetes is the integrated raft storage and frequent snapshots of Vault should be taken and stored, making it possible to restore Vault in case of data loss.
In this post we will walk through an implementation using a Kubernetes CronJob to take daily snapshots and store them in an AWS S3 bucket for safe keeping. Note that Vault Enterprise makes backups a native feature that should be used if you have that version.
Write a Kubernetes CronJob#
Let’s start with the CronJob and go backwards from there, because in order for the CronJob to work we will need to authenticate with both HashiCorp Vault and an AWS S3 bucket.
1---
2apiVersion: batch/v1
3kind: CronJob
4 metadata:
5 name: vault-snapshot-cronjob
6spec:
7 # Set your desired cron schedule
8 schedule: "0 2 * * 1-5"
9 successfulJobsHistoryLimit: 10
10 failedJobsHistoryLimit: 3
11 jobTemplate:
12 spec:
13 template:
14 spec:
15 # Use a ServiceAccount that we will create next (keep reading!)
16 serviceAccountName: vault-snapshot
17 volumes:
18 # Create an empty drive to share the snapshot across containers
19 - name: share
20 emptyDir: {}
21 initContainers:
22 # Run an init container that creates the the snapshot of Vault
23 - name: vault-snapshot
24 # Choose an appropriate Vault version (e.g. same as your Vault setup)
25 image: vault:1.9.4
26 command: ["/bin/sh", "-c"]
27 args:
28 # 1. Get the ServiceAccount token which we will use to authenticate against Vault
29 # 2. Login to Vault using the SA token at the endpoint where the Kubernetes auth engine
30 # has been enabled
31 # 3. Use the Vault CLI to store a snapshot in our empty volume
32 - |
33 SA_TOKEN=$(cat /var/run/secrets/kubernetes.io/serviceaccount/token);
34 export VAULT_TOKEN=$(vault write -field=token auth/kubernetes/login jwt=$SA_TOKEN role=vault-snapshot);
35 vault operator raft snapshot save /share/vault.snap;
36 env:
37 # Set the Vault address using the Kubernetes service name
38 - name: VAULT_ADDR
39 value: http://vault.vault.svc.cluster.local:8200
40 volumeMounts:
41 - mountPath: /share
42 name: share
43 containers:
44 # Run a container with the AWS CLI and copy the snapshot to our S3 bucket
45 - name: aws-s3-backup
46 image: amazon/aws-cli:2.2.14
47 command:
48 - /bin/sh
49 args:
50 - -ec
51 # Copy the snapshot file to an S3 bucket called hashicorp-vault-snapshots
52 - aws s3 cp /share/vault.snap s3://hashicorp-vault-snapshots/vault_$(date +"%Y%m%d_%H%M%S").snap;
53 volumeMounts:
54 - mountPath: /share
55 name: share
56 restartPolicy: OnFailure
Writing the CronJob is probably the easiest part. Now we need to ensure that the two containers we are running (vault-snapshot
and aws-s3-backup
) can authenticate with Vault and AWS. For this, we will rely on a ServiceAccount.
Authentication with Vault and AWS#
Define a Kubernetes ServiceAccount#
Let’s define a Kubernetes ServiceAccount called vault-snapshot
that we referenced in the above CronJob.
1---
2apiVersion: v1
3kind: ServiceAccount
4metadata:
5 name: vault-snapshot
6 annotations:
7 # Assume the AWS role hashicorp-vault-snapshotter
8 eks.amazonaws.com/role-arn: arn:aws:iam::<ACCOUNT_ID>:role/hashicorp-vault-snapshotter
Notice how we add the annotation to assume the AWS role hashicorp-vault-snapshotter
. For details on assuming AWS IAM roles from EKS, please read our blog post on that topic.
Define an AWS IAM Role#
Let’s define the AWS IAM role hashicorp-vault-snapshotter
and make the vault-snapshot
ServiceAccount a trusted entity that can assume that role.
1locals {
2 vault_cluster = "<eks-cluster-name>"
3 k8s_service_account_name = "vault-snapshot"
4 k8s_service_account_namespace = "vault"
5 vault_cluster_oidc_issuer_url = trimprefix(data.aws_eks_cluster.vault_cluster.identity[0].oidc[0].issuer, "https://")
6}
7
8#
9# Might as well create the S3 bucket whilst we are at it...
10#
11resource "aws_s3_bucket" "snapshots" {
12 bucket = "hashicorp-vault-snapshots"
13}
14
15#
16# Get the caller identity so that we can get the AWS Account ID
17#
18data "aws_caller_identity" "current" {}
19
20#
21# Get the cluster that vault is running in
22#
23data "aws_eks_cluster" "vault_cluster" {
24 name = local.vault_cluster
25}
26
27#
28# Create the IAM role that will be assumed by the service account
29#
30resource "aws_iam_role" "snapshot" {
31 name = "hashicorp-vault-snapshotter"
32 assume_role_policy = data.aws_iam_policy_document.snapshot.json
33
34 inline_policy {
35 name = "hashicorp-vault-snapshot"
36 policy = jsonencode({
37 Version = "2012-10-17"
38 Statement = [
39 {
40 Effect = "Allow",
41 Action = [
42 "s3:PutObject",
43 "s3:GetObject",
44 ],
45 # Refer to the S3 bucket we created along the way
46 Resource = ["${aws_s3_bucket.snapshots.arn}/*"]
47 }
48 ]
49 })
50 }
51}
52
53#
54# Create IAM policy allowing the k8s service account to assume the IAM role
55#
56data "aws_iam_policy_document" "snapshot" {
57 statement {
58 actions = ["sts:AssumeRoleWithWebIdentity"]
59
60 principals {
61 type = "Federated"
62 identifiers = [
63 "arn:aws:iam::${data.aws_caller_identity.current.account_id}:oidc-provider/${local.vault_cluster_oidc_issuer_url}"
64 ]
65 }
66
67 # Limit the scope so that only our desired service account can assume this role
68 condition {
69 test = "StringEquals"
70 variable = "${local.vault_cluster_oidc_issuer_url}:sub"
71 values = [
72 "system:serviceaccount:${local.k8s_service_account_namespace}:${local.k8s_service_account_name}"
73 ]
74 }
75 }
76}
Configure Vault Kubernetes Auth Engine#
So far we have a Kubernetes ServiceAccount which can assume an AWS IAM role which has access to S3. What’s missing is the authentication with Vault.
You could use the Vault AWS Auth Engine and use the same AWS role for that. However, in our case we use Vault to provide secrets to Kubernetes workloads and therefore already have multiple EKS clusters authenticated with Vault so it made sense to reuse that logic, and that’s what we will show below.
This is quite an involved process, and could make it’s own blog post, but a summary of what we will do is:
- Create a Kubernetes ServiceAccount with the ClusterRole
system:auth-delegator
- This gives Vault the ability to authenticate and authorize Kubernetes ServiceAccount tokens that are used to authenticate with Vault
- Remember our initContainer passes the Kubernetes ServiceAccount token to Vault in exchange for an ordinary Vault Token
- Read more here: https://kubernetes.io/docs/reference/access-authn-authz/rbac/#other-component-roles
- This gives Vault the ability to authenticate and authorize Kubernetes ServiceAccount tokens that are used to authenticate with Vault
- Enable a Vault Auth Engine of type
kubernetes
at the mount pathkubernetes
kubernetes
is the default mount path, so you probably want to use something likekube-<eks-cluster-name>
so that you can authenticate with multiple clusters
- Configure the Kubernetes auth engine using the Kubernetes ServiceAccount we created in step 1
1locals {
2 namespace = "vault-client"
3}
4
5#
6# Create kubernetes service account that vault can use to authenticate requests
7# from the cluster
8#
9resource "kubernetes_service_account" "this" {
10 metadata {
11 name = "vault-auth"
12 namespace = local.namespace
13 }
14 automount_service_account_token = "true"
15}
16
17#
18# Give the service account permissions to authenticate other service accounts
19#
20resource "kubernetes_cluster_role_binding" "this" {
21 metadata {
22 name = "vault-token-auth"
23 }
24 role_ref {
25 api_group = "rbac.authorization.k8s.io"
26 kind = "ClusterRole"
27 name = "system:auth-delegator"
28 }
29 subject {
30 kind = "ServiceAccount"
31 name = kubernetes_service_account.this.metadata[0].name
32 namespace = local.namespace
33 }
34}
35
36#
37# Get the secret created for the service account
38#
39data "kubernetes_secret" "this" {
40 metadata {
41 name = kubernetes_service_account.this.default_secret_name
42 namespace = local.namespace
43 }
44}
45
46#
47# Create the vault auth backend
48#
49resource "vault_auth_backend" "this" {
50 type = "kubernetes"
51 # Make this something else for multiple clusters
52 path = "kubernetes"
53}
54
55#
56# Configure the backend to use the service account we created, so that vault
57# can verify requests made to this backend
58#
59resource "vault_kubernetes_auth_backend_config" "this" {
60 backend = vault_auth_backend.this.path
61 # Get the EKS endpoint from somewhere, like a `aws_eks_cluster` data block
62 kubernetes_host = var.cluster.endpoint
63 kubernetes_ca_cert = data.kubernetes_secret.auth.data["ca.crt"]
64 token_reviewer_jwt = data.kubernetes_secret.auth.data["token"]
65 issuer = "api"
66 disable_iss_validation = "true"
67}
Create a Vault Kubernetes Role#
Now that we have a Kubernetes auth engine mounted and configured, we need to create a role in Vault so that a ServiceAccount in our Kubernetes cluster can actually do something!
We need this Vault policy. Let’s store it in a file such as policies/sys-snapshot-read.hcl
1path "sys/storage/raft/snapshot" {
2 capabilities = ["read"]
3}
And now the Terraform code to create the Vault role.
1#
2# Create a Vault policy based of a template
3#
4resource "vault_policy" "this" {
5 name = "vault-snapshot"
6 policy = file("policies/sys-snapshot-read.hcl")
7}
8
9#
10# Create a Vault role with our snapshot policy, that is bound
11# to the vault-snapshot Kubernetes ServiceAccount in the
12# vault-snapshot namespace.
13#
14# NOTE: Make sure you use the correct namespace and serviceaccount!
15#
16resource "vault_kubernetes_auth_backend_role" "this" {
17 depends_on = [vault_policy.this]
18
19 backend = vault_auth_backend.this.path
20 role_name = "vault-snapshot"
21 bound_service_account_names = ["vault-snapshot"]
22 bound_service_account_namespaces = ["vault-snapshot"]
23 token_policies = ["vault-snapshot"]
24 token_ttl = 3600
25 audience = null
26}
Testing our Vault backup process#
The CronJob is currently set to run daily, and we probably want to test this without waiting for the CronJob each time... I would be amazed if you get this working first time - if so, you owe me at least one beer!
1# Let's do our work in a separate namespace
2kubectl create namespace vault-snapshot
3
4# Set active namespace
5kubens vault-snapshot
6
7# Apply the CronJob from earlier if you haven't already
8kubectl apply -f vault-snapshot-cronjob.yaml
9
10# Create a Job from the CronJob to test that it works
11kubectl create job --from=cronjob/vault-snapshot-cronjob test-1
12# Do your thing and describe/debug the job...
13kubectl describe job.batch/test-1
14# Check logs from vault initContainer
15kubectl logs test-1-<hash> -c vault-snapshot
16# Check logs from aws container
17kubectl logs test-1-<hash>
18
19# Probably something failed, so repeat the above with test-2 :)
20# Remember to cleanup your jobs afterwards.
Once you get this working you should have a snapshot stored in your AWS S3 bucket. That’s great, so how do you check that this can be restored?
Restoring Vault Snapshot#
We found the quickest and easiest way to test a restore was to spin up a dev instance of Vault in EKS without persistent storage, initialise the fresh vault instance and restore the snapshot.
1# First download the snapshot from S3, e.g. via the AWS Console (UI)
2ls vault_20220325_082239.snap
3
4# Create another namespace for this. Make sure this Vault instance will
5# also have access to your AWS KMS (or however you auto-unseal Vault).
6# And if you don't currently auto-unseal Vault in AWS EKS... leave a
7# comment and I will help make your life easier :)
8kubectl create namespace vault-dev
9
10# Set active namespace
11kubens vault-dev
12
13# Deploy a dev instance of Vault without persistent storage, e.g.
14helm install vault hashicorp/vault -f dev-values.yaml
15
16# Check the Vault pod (it should not have started because Vault needs
17# to be initialised)
18kubectl get pods
19
20# Intialise Vault
21kubectl exec -n vault-dev -ti vault-dev-0 -- vault operator init
22# Check the log... What we care about is the Root token
23
24# Next let's setup port-forwarding so that we can access our dev instance
25# without any Ingresses and extra hassle
26kubectl port-forward svc/vault 8200:8200
27
28# Setup our Vault variables
29export VAULT_ADDR=http://localhost:8200
30export VAULT_TOKEN=<root-token> # Root token from init command above
31
32# Restore the snapshot
33vault operator raft snapshot restore vault_20220325_082239.snap
34
35# Browse to http://localhost:8200 or use your Terminal to verify that
36# Vault has restored to the point you'd expect.
Conclusion#
This post has gone through setting up automated backups of HashiCorp Vault OSS running on AWS EKS using a Kubernetes CronJob, and storing the snapshots in an S3 bucket. There’s a lot of pieces to the puzzle, and hopefully this post has given some insight into how it can be setup in a secure way following The Principle of Least Privilege.
If you have any questions, feedback or want help with your Vault setup please leave us a comment!