How to automate HashiCorp Vault OSS backups in AWS EKS

Jacob Lärfors • 4 minutes • 2022-04-26

How to automate HashiCorp Vault OSS backups in AWS EKS

HashiCorp Vault is an API-driven tool for storing and retrieving static and dynamic secrets. Vault can be deployed in a Kubernetes cluster using the official Helm chart. The recommended storage for Vault in Kubernetes is the integrated raft storage and frequent snapshots of Vault should be taken and stored, making it possible to restore Vault in case of data loss.

In this post we will walk through an implementation using a Kubernetes CronJob to take daily snapshots and store them in an AWS S3 bucket for safe keeping. Note that Vault Enterprise makes backups a native feature that should be used if you have that version.

Write a Kubernetes CronJob#

Let’s start with the CronJob and go backwards from there, because in order for the CronJob to work we will need to authenticate with both HashiCorp Vault and an AWS S3 bucket.

 2apiVersion: batch/v1
 3kind: CronJob
 4 metadata:
 5  name: vault-snapshot-cronjob
 7 # Set your desired cron schedule
 8  schedule: "0 2 * * 1-5"
 9  successfulJobsHistoryLimit: 10
10  failedJobsHistoryLimit: 3
11  jobTemplate:
12    spec:
13      template:
14        spec:
15           # Use a ServiceAccount that we will create next (keep reading!)
16          serviceAccountName: vault-snapshot
17          volumes:
18          # Create an empty drive to share the snapshot across containers
19            - name: share
20              emptyDir: {}
21          initContainers:
22            # Run an init container that creates the the snapshot of Vault
23            - name: vault-snapshot
24              # Choose an appropriate Vault version (e.g. same as your Vault setup)
25              image: vault:1.9.4
26              command: ["/bin/sh", "-c"]
27              args:
28                # 1. Get the ServiceAccount token which we will use to authenticate against Vault
29                # 2. Login to Vault using the SA token at the endpoint where the Kubernetes auth engine
30                #    has been enabled
31                # 3. Use the Vault CLI to store a snapshot in our empty volume
32                - |
33                  SA_TOKEN=$(cat /var/run/secrets/;
34                  export VAULT_TOKEN=$(vault write -field=token auth/kubernetes/login jwt=$SA_TOKEN role=vault-snapshot);
35                  vault operator raft snapshot save /share/vault.snap;                  
36              env:
37                # Set the Vault address using the Kubernetes service name
38                - name: VAULT_ADDR
39                  value: http://vault.vault.svc.cluster.local:8200
40              volumeMounts:
41                - mountPath: /share
42                  name: share
43          containers:
44            # Run a container with the AWS CLI and copy the snapshot to our S3 bucket
45            - name: aws-s3-backup
46              image: amazon/aws-cli:2.2.14
47              command:
48                - /bin/sh
49              args:
50                - -ec
51                # Copy the snapshot file to an S3 bucket called hashicorp-vault-snapshots
52                - aws s3 cp /share/vault.snap s3://hashicorp-vault-snapshots/vault_$(date +"%Y%m%d_%H%M%S").snap;
53              volumeMounts:
54                - mountPath: /share
55                  name: share
56          restartPolicy: OnFailure

Writing the CronJob is probably the easiest part. Now we need to ensure that the two containers we are running (vault-snapshot and aws-s3-backup) can authenticate with Vault and AWS. For this, we will rely on a ServiceAccount.

Authentication with Vault and AWS#

Define a Kubernetes ServiceAccount#

Let’s define a Kubernetes ServiceAccount called vault-snapshot that we referenced in the above CronJob.

2apiVersion: v1
3kind: ServiceAccount
5  name: vault-snapshot
6  annotations:
7    # Assume the AWS role hashicorp-vault-snapshotter
8 arn:aws:iam::<ACCOUNT_ID>:role/hashicorp-vault-snapshotter

Notice how we add the annotation to assume the AWS role hashicorp-vault-snapshotter. For details on assuming AWS IAM roles from EKS, please read our blog post on that topic.

Define an AWS IAM Role#

Let’s define the AWS IAM role hashicorp-vault-snapshotter and make the vault-snapshotServiceAccount a trusted entity that can assume that role.

 1locals {
 2  vault_cluster                 = "<eks-cluster-name>"
 3  k8s_service_account_name      = "vault-snapshot"
 4  k8s_service_account_namespace = "vault"
 5  vault_cluster_oidc_issuer_url = trimprefix(data.aws_eks_cluster.vault_cluster.identity[0].oidc[0].issuer, "https://")
 9# Might as well create the S3 bucket whilst we are at it...
11resource "aws_s3_bucket" "snapshots" {
12  bucket = "hashicorp-vault-snapshots"
16# Get the caller identity so that we can get the AWS Account ID
18data "aws_caller_identity" "current" {}
21# Get the cluster that vault is running in
23data "aws_eks_cluster" "vault_cluster" {
24  name = local.vault_cluster
28# Create the IAM role that will be assumed by the service account
30resource "aws_iam_role" "snapshot" {
31  name               = "hashicorp-vault-snapshotter"
32  assume_role_policy = data.aws_iam_policy_document.snapshot.json
34  inline_policy {
35    name = "hashicorp-vault-snapshot"
36    policy = jsonencode({
37      Version = "2012-10-17"
38      Statement = [
39        {
40          Effect = "Allow",
41          Action = [
42            "s3:PutObject",
43            "s3:GetObject",
44          ],
45          # Refer to the S3 bucket we created along the way
46          Resource = ["${aws_s3_bucket.snapshots.arn}/*"]
47        }
48      ]
49    })
50  }
54# Create IAM policy allowing the k8s service account to assume the IAM role
56data "aws_iam_policy_document" "snapshot" {
57  statement {
58    actions = ["sts:AssumeRoleWithWebIdentity"]
60    principals {
61      type = "Federated"
62      identifiers = [
63        "arn:aws:iam::${data.aws_caller_identity.current.account_id}:oidc-provider/${local.vault_cluster_oidc_issuer_url}"
64      ]
65    }
67    # Limit the scope so that only our desired service account can assume this role
68    condition {
69      test     = "StringEquals"
70      variable = "${local.vault_cluster_oidc_issuer_url}:sub"
71      values = [
72        "system:serviceaccount:${local.k8s_service_account_namespace}:${local.k8s_service_account_name}"
73      ]
74    }
75  }

Configure Vault Kubernetes Auth Engine#

So far we have a Kubernetes ServiceAccount which can assume an AWS IAM role which has access to S3. What’s missing is the authentication with Vault.

You could use the Vault AWS Auth Engine and use the same AWS role for that. However, in our case we use Vault to provide secrets to Kubernetes workloads and therefore already have multiple EKS clusters authenticated with Vault so it made sense to reuse that logic, and that’s what we will show below.

This is quite an involved process, and could make it’s own blog post, but a summary of what we will do is:

  1. Create a Kubernetes ServiceAccount with the ClusterRole system:auth-delegator
    1. This gives Vault the ability to authenticate and authorize Kubernetes ServiceAccount tokens that are used to authenticate with Vault
      1. Remember our initContainer passes the Kubernetes ServiceAccount token to Vault in exchange for an ordinary Vault Token
    2. Read more here:
  2. Enable a Vault Auth Engine of type kubernetes at the mount path kubernetes
    1. kubernetes is the default mount path, so you probably want to use something like kube-<eks-cluster-name> so that you can authenticate with multiple clusters
  3. Configure the Kubernetes auth engine using the Kubernetes ServiceAccount we created in step 1
 1locals {
 2  namespace = "vault-client"
 6# Create kubernetes service account that vault can use to authenticate requests
 7# from the cluster
 9resource "kubernetes_service_account" "this" {
10  metadata {
11    name      = "vault-auth"
12    namespace = local.namespace
13  }
14  automount_service_account_token = "true"
18# Give the service account permissions to authenticate other service accounts
20resource "kubernetes_cluster_role_binding" "this" {
21  metadata {
22    name = "vault-token-auth"
23  }
24  role_ref {
25    api_group = ""
26    kind      = "ClusterRole"
27    name      = "system:auth-delegator"
28  }
29  subject {
30    kind      = "ServiceAccount"
31    name      = kubernetes_service_account.this.metadata[0].name
32    namespace = local.namespace
33  }
37# Get the secret created for the service account
39data "kubernetes_secret" "this" {
40  metadata {
41    name      = kubernetes_service_account.this.default_secret_name
42    namespace = local.namespace
43  }
47# Create the vault auth backend
49resource "vault_auth_backend" "this" {
50  type = "kubernetes"
51  # Make this something else for multiple clusters
52  path = "kubernetes"
56# Configure the backend to use the service account we created, so that vault
57# can verify requests made to this backend
59resource "vault_kubernetes_auth_backend_config" "this" {
60  backend                = vault_auth_backend.this.path
61  # Get the EKS endpoint from somewhere, like a `aws_eks_cluster` data block
62  kubernetes_host        = var.cluster.endpoint
63  kubernetes_ca_cert     =["ca.crt"]
64  token_reviewer_jwt     =["token"]
65  issuer                 = "api"
66  disable_iss_validation = "true"

Create a Vault Kubernetes Role#

Now that we have a Kubernetes auth engine mounted and configured, we need to create a role in Vault so that a ServiceAccount in our Kubernetes cluster can actually do something!

We need this Vault policy. Let’s store it in a file such as policies/sys-snapshot-read.hcl

1path "sys/storage/raft/snapshot" {
2  capabilities = ["read"]

And now the Terraform code to create the Vault role.

 2# Create a Vault policy based of a template
 4resource "vault_policy" "this" {
 5  name = "vault-snapshot"
 6  policy = file("policies/sys-snapshot-read.hcl")
10# Create a Vault role with our snapshot policy, that is bound
11# to the vault-snapshot Kubernetes ServiceAccount in the
12# vault-snapshot namespace.
14# NOTE: Make sure you use the correct namespace and serviceaccount!
16resource "vault_kubernetes_auth_backend_role" "this" {
17  depends_on = [vault_policy.this]
19  backend                          = vault_auth_backend.this.path
20  role_name                        = "vault-snapshot"
21  bound_service_account_names      = ["vault-snapshot"]
22  bound_service_account_namespaces = ["vault-snapshot"]
23  token_policies                   = ["vault-snapshot"]
24  token_ttl                        = 3600
25  audience                         = null

Testing our Vault backup process#

The CronJob is currently set to run daily, and we probably want to test this without waiting for the CronJob each time... I would be amazed if you get this working first time - if so, you owe me at least one beer!

 1# Let's do our work in a separate namespace
 2kubectl create namespace vault-snapshot
 4# Set active namespace
 5kubens vault-snapshot
 7# Apply the CronJob from earlier if you haven't already
 8kubectl apply -f vault-snapshot-cronjob.yaml
10# Create a Job from the CronJob to test that it works
11kubectl create job --from=cronjob/vault-snapshot-cronjob test-1
12# Do your thing and describe/debug the job...
13kubectl describe job.batch/test-1
14# Check logs from vault initContainer
15kubectl logs test-1-<hash> -c vault-snapshot
16# Check logs from aws container
17kubectl logs test-1-<hash>
19# Probably something failed, so repeat the above with test-2 :)
20# Remember to cleanup your jobs afterwards.

Once you get this working you should have a snapshot stored in your AWS S3 bucket. That’s great, so how do you check that this can be restored?

Restoring Vault Snapshot#

We found the quickest and easiest way to test a restore was to spin up a dev instance of Vault in EKS without persistent storage, initialise the fresh vault instance and restore the snapshot.

 1# First download the snapshot from S3, e.g. via the AWS Console (UI)
 2ls vault_20220325_082239.snap
 4# Create another namespace for this. Make sure this Vault instance will
 5# also have access to your AWS KMS (or however you auto-unseal Vault).
 6# And if you don't currently auto-unseal Vault in AWS EKS... leave a
 7# comment and I will help make your life easier :)
 8kubectl create namespace vault-dev
10# Set active namespace
11kubens vault-dev
13# Deploy a dev instance of Vault without persistent storage, e.g.
14helm install vault hashicorp/vault -f dev-values.yaml
16# Check the Vault pod (it should not have started because Vault needs
17# to be initialised)
18kubectl get pods
20# Intialise Vault
21kubectl exec -n vault-dev -ti vault-dev-0 -- vault operator init
22# Check the log... What we care about is the Root token
24# Next let's setup port-forwarding so that we can access our dev instance
25# without any Ingresses and extra hassle
26kubectl port-forward svc/vault 8200:8200
28# Setup our Vault variables
29export VAULT_ADDR=http://localhost:8200
30export VAULT_TOKEN=<root-token> # Root token from init command above
32# Restore the snapshot
33vault operator raft snapshot restore vault_20220325_082239.snap
35# Browse to http://localhost:8200 or use your Terminal to verify that
36# Vault has restored to the point you'd expect.


This post has gone through setting up automated backups of HashiCorp Vault OSS running on AWS EKS using a Kubernetes CronJob, and storing the snapshots in an S3 bucket. There’s a lot of pieces to the puzzle, and hopefully this post has given some insight into how it can be setup in a secure way following The Principle of Least Privilege.

If you have any questions, feedback or want help with your Vault setup please leave us a comment!


Read similar posts



7 minutes

How to build dashboards of your Kubernetes cluster with Steampipe

In this blog post we will take a look at Steampipe, which is a tool that can be used to query all kinds of APIs using an unified language for the queries; SQL. We’ll be querying a Kubernetes cluster with Steampipe and then building a beautiful dashboard out of our queries without breaking a sweat.



8 minutes

How to scale Kubernetes with any metrics using Kubernetes Event-driven Autoscaling (KEDA)

In this blog, we will try to explore how a sample application like Elastic Stack can be scaled based on metrics other than CPU, memory or storage usage.



2 minutes

Helsinki HashiCorp User Group Meetup #6 summary

A summary of the sixth Helsinki HashiCorp User Group (HUG) including presentations on Infrastructure as Code in early stage startups and Azure Landing Zone.

Sign up for our monthly newsletter.

By submitting this form you agree to our Privacy Policy