Secrets management with CCE and Hashicorp Vault

Most modern IT setups are composed of several subsystems like databases, object stores, master controller, node access, and more. To access one component from another, some form of credentials are required. Configuring and storing these secrets directly in the components is considered as an anti-pattern, since a vulnerability of one component may iteratively and transitively affect the security of the whole setup.

With centralized secret management in place, it's not necessary to keep secrets used by various applications spread across DevOps environments. This helps to close some security attack vectors (like secret sprawl, security islands), but usually introduces a problem of the so-called Secret Zero as a key to the key storage.

Solution Design

Vault is an open-source software, provided and maintained by Hashicorp, that addresses this very problem. It is considered one of the reference solutions for it. This article demonstrates how to utilize infrastructure authorization with Hashicorp Vault in an CCE-powered setup. As an example workload, we deploy a Zookeeper cluster with enabled TLS protection. Certificates for Zookeeper are stored in Vault, and they oblige required practices like rotations or audits. Zookeeper can easily be replaced by any other component that requires access to internal credentials. TLS secrets are kept in the Vault. They are being read by Vault Agent component running as a sidecar in Zookeeper service pod and writes certificates onto the file system. Zookeeper services reads certificates populated by Agent. Vault Agent is configured to use password-less access to Vault. Further in the document it is explained how exactly this is implemented.

Establishing trust between CCE and Vault

Before any application managed by the CCE is able to login to Vault relying on infrastructure based authentication it is required to do some steps on the Vault side. Kubernetes auth plugin is enabled and configured to only access requests from specific Kubernetes cluster by providing its Certificate Authority. To allow several multiple different CCE clusters to use Vault, a dedicated auth path is going to be used.

vault auth enable -path kubernetes_cce1 kubernetes
vault write auth/kubernetes_cce1/config \
    kubernetes_host="$K8S_HOST" \
    kubernetes_ca_cert="$SA_CA_CRT"

Since in our example a dedicated service account with token is being periodically rotated using client JWT as reviewer JWT can be used.

Access rules for Vault

Having Auth plugin enabled, as described above, CCE workloads are able to authenticate to Vault, but they can do nothing. It is now necessary to establish further level of authorization and let particular service accounts of CCE to get access to secrets in Vault.

For the scope of the use case, we grant the Zookeeper service account from its namespace access to the TLS secrets stored in Vault's key-value store. For that a policy providing a read-only access to the /tls/zk* and /tls/ca paths is created.

vault policy write tls-zk-ro - <<EOF
path "secret/data/tls/zk_*" {capabilities = ["read"] }
path "secret/data/tls/ca" {capabilities = ["read"] }
path "secret/metadata/tls/zk_*" {capabilities = ["read"] }
path "secret/metadata/tls/ca" {capabilities = ["read"] }
EOF

Next granting the policy to the particular requestor (zookeeper service account in zookeeper namespace) must be done.

vault write auth/kubernetes_cce1/role/zookeeper \
    bound_service_account_names=zookeeper \
    bound_service_account_namespaces=zookeeper \
    policies=tls-zk-ro \
    ttl=2h

With this done token of the service account zookeeper in the zookeeper namespace is able to access to the vault for reading secrets located under /secret/tls path. And since it is highly recommended to follow the least required privilege principle only read only access to the TLS data is granted. A time to live of two hours is being used here meaning that once application authorize to Vault the token it gets can be used during next two hours. After two hours Vault token becomes invalid and Vault Agent gets a new one valid for next 2 hours. This needs to be carefully aligned with the time to live or the service account token to minimize their overlap. It is advised to keep it relatively short.

This is one the most sensitive steps in the whole configuration, since the applications deployed in the Kubernetes may escape their scope or get compromised by attackers. Reducing the number of secrets the accessor can read mitigates this risk.

Populating secrets in Vault

Vault offer two options to access TLS certificates:

Store certificate data in the KeyValue store
Use PKI secrets engine to issue certificates

Vault enables users not only to store TLS certificates data in the key-value store, but also to create and revoke them. To keep this tutorial simple enough we are not going to do this and just upload generated certificates into the KV store. For production setups this example can be easily extended with extra actions.

vault kv put secret/tls/ca certificate=@ca.crt
vault kv put secret/tls/zk_server certificate=@zk_server.crt private_key=@zk_server.key
vault kv put secret/tls/zk_client certificate=@zk_client.crt private_key=@zk_client.key

import fmt

Certificate paths and property names used here are referenced by the Zookeeper installation.

Deploying Zookeeper

Now that the secrets are stored safely in Vault and only allowed applications can fetch them it is time to look how exactly the application accesses the secrets. Generally, utilizing Vault requires modification of the application. Vault agent is a tool that was created to simplify secrets delivery for applications when it is hard or difficult to change the application itself. The Agent is taking care of reading secrets from Vault and can deliver them to the file system.

There are many way how to properly implement Zookeeper service on the Kubernetes. The scope of the blueprint is not Zookeeper itself, but demonstrating how an application can be supplied by required certificates. The reference architecture described here bases on the best practices gathered from various sources and extended by HashiCorp Vault. It overrides default Zookeeper start scripts in order to allow better control of the runtime settings and properly fill all required configuration options for TLS to work. Other methods of deploying Zookeeper can be easily used here instead.

Create a Kubernetes namespace named zookeeper.

kubectl create namespace zookeeper

Create a Kubernetes service account named zookeeper.

kubectl create serviceaccount zookeeper

In Kubernetes a service account provides an identity for the services running in the pod so that the process can access Kubernetes API. The same identity can be used to access Vault, but require one special permission -access to the token review API of the Kubernetes. When instead a dedicated reviewer JWT is used, this step is not necessary, but it also means long-living sensitive data is used and frequently transferred over the network. More details on various ways to use Kubernetes tokens to authorize to Vault can be found here.

kubectl create clusterrolebinding vault-client-auth-delegator \
    --clusterrole=system:auth-delegator \
    --serviceaccount=zookeeper:zookeeper

Create a Kubernetes ConfigMap with all required configurations. One possible approach is to define dedicated health and readiness check scripts and to override automatically created Zookeeper start script. This is especially useful when TLS protection is enabled, but default container scripts do not support this.

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: zookeeper-config
  namespace: "zookeeper"
data:
  ok: |
    #!/bin/sh
    # This sript is used by live-check of Kubernetes pod
    if [ -f /tls/ca.pem ]; then
      echo "srvr" | openssl s_client -CAfile /tls/ca.pem -cert /tls/client/tls.crt \
        -key /tls/client/tls.key -connect 127.0.0.1:${1:-2281} -quiet -ign_eof 2>/dev/null | grep Mode

    else
      zkServer.sh status
    fi

  ready: |
    #!/bin/sh
    # This sript is used by readiness-check of Kubernetes pod
    if [ -f /tls/ca.pem ]; then
      echo "ruok" | openssl s_client -CAfile /tls/ca.pem -cert /tls/client/tls.crt \
        -key /tls/client/tls.key -connect 127.0.0.1:${1:-2281} -quiet -ign_eof 2>/dev/null
    else
      echo ruok | nc 127.0.0.1 ${1:-2181}
    fi

  run: |
    #!/bin/bash
    # This is the main starting script
    set -a
    ROOT=$(echo /apache-zookeeper-*)
    ZK_USER=${ZK_USER:-"zookeeper"}
    ZK_LOG_LEVEL=${ZK_LOG_LEVEL:-"INFO"}
    ZK_DATA_DIR=${ZK_DATA_DIR:-"/data"}
    ZK_DATA_LOG_DIR=${ZK_DATA_LOG_DIR:-"/data/log"}
    ZK_CONF_DIR=${ZK_CONF_DIR:-"/conf"}
    ZK_CLIENT_PORT=${ZK_CLIENT_PORT:-2181}
    ZK_SSL_CLIENT_PORT=${ZK_SSL_CLIENT_PORT:-2281}
    ZK_SERVER_PORT=${ZK_SERVER_PORT:-2888}
    ZK_ELECTION_PORT=${ZK_ELECTION_PORT:-3888}
    ID_FILE="$ZK_DATA_DIR/myid"
    ZK_CONFIG_FILE="$ZK_CONF_DIR/zoo.cfg"
    LOG4J_PROPERTIES="$ZK_CONF_DIR/log4j.properties"
    HOST=$(hostname)
    DOMAIN=`hostname -d`
    APPJAR=$(echo $ROOT/*jar)
    CLASSPATH="${ROOT}/lib/*:${APPJAR}:${ZK_CONF_DIR}:"
    if [[ $HOST =~ (.*)-([0-9]+)$ ]]; then
        NAME=${BASH_REMATCH[1]}
        ORD=${BASH_REMATCH[2]}
        MY_ID=$((ORD+1))
    else
        echo "Failed to extract ordinal from hostname $HOST"
        exit 1
    fi
    mkdir -p $ZK_DATA_DIR
    mkdir -p $ZK_DATA_LOG_DIR
    echo $MY_ID >> $ID_FILE

    echo "dataDir=$ZK_DATA_DIR" >> $ZK_CONFIG_FILE
    echo "dataLogDir=$ZK_DATA_LOG_DIR" >> $ZK_CONFIG_FILE
    echo "4lw.commands.whitelist=*" >> $ZK_CONFIG_FILE
    # Client TLS configuration
    if [[ -f /tls/ca.pem ]]; then
      echo "secureClientPort=$ZK_SSL_CLIENT_PORT" >> $ZK_CONFIG_FILE
      echo "ssl.keyStore.location=/tls/client/client.pem" >> $ZK_CONFIG_FILE
      echo "ssl.trustStore.location=/tls/ca.pem" >> $ZK_CONFIG_FILE
    else
      echo "clientPort=$ZK_CLIENT_PORT" >> $ZK_CONFIG_FILE
    fi
    # Server TLS configuration
    if [[ -f /tls/ca.pem ]]; then
      echo "serverCnxnFactory=org.apache.zookeeper.server.NettyServerCnxnFactory" >> $ZK_CONFIG_FILE
      echo "sslQuorum=true" >> $ZK_CONFIG_FILE
      echo "ssl.quorum.keyStore.location=/tls/server/server.pem" >> $ZK_CONFIG_FILE
      echo "ssl.quorum.trustStore.location=/tls/ca.pem" >> $ZK_CONFIG_FILE
    fi
    for (( i=1; i<=$ZK_REPLICAS; i++ ))
    do
        echo "server.$i=$NAME-$((i-1)).$DOMAIN:$ZK_SERVER_PORT:$ZK_ELECTION_PORT" >> $ZK_CONFIG_FILE
    done
    rm -f $LOG4J_PROPERTIES
    echo "zookeeper.root.logger=$ZK_LOG_LEVEL, CONSOLE" >> $LOG4J_PROPERTIES
    echo "zookeeper.console.threshold=$ZK_LOG_LEVEL" >> $LOG4J_PROPERTIES
    echo "zookeeper.log.threshold=$ZK_LOG_LEVEL" >> $LOG4J_PROPERTIES
    echo "zookeeper.log.dir=$ZK_DATA_LOG_DIR" >> $LOG4J_PROPERTIES
    echo "zookeeper.log.file=zookeeper.log" >> $LOG4J_PROPERTIES
    echo "zookeeper.log.maxfilesize=256MB" >> $LOG4J_PROPERTIES
    echo "zookeeper.log.maxbackupindex=10" >> $LOG4J_PROPERTIES
    echo "zookeeper.tracelog.dir=$ZK_DATA_LOG_DIR" >> $LOG4J_PROPERTIES
    echo "zookeeper.tracelog.file=zookeeper_trace.log" >> $LOG4J_PROPERTIES
    echo "log4j.rootLogger=\${zookeeper.root.logger}" >> $LOG4J_PROPERTIES
    echo "log4j.appender.CONSOLE=org.apache.log4j.ConsoleAppender" >> $LOG4J_PROPERTIES
    echo "log4j.appender.CONSOLE.Threshold=\${zookeeper.console.threshold}" >> $LOG4J_PROPERTIES
    echo "log4j.appender.CONSOLE.layout=org.apache.log4j.PatternLayout" >> $LOG4J_PROPERTIES
    echo "log4j.appender.CONSOLE.layout.ConversionPattern=\
      %d{ISO8601} [myid:%X{myid}] - %-5p [%t:%C{1}@%L] - %m%n" >> $LOG4J_PROPERTIES
    if [ -n "$JMXDISABLE" ]
    then
        MAIN=org.apache.zookeeper.server.quorum.QuorumPeerMain
    else
        MAIN="-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=$JMXPORT \
          -Dcom.sun.management.jmxremote.authenticate=$JMXAUTH \
          -Dcom.sun.management.jmxremote.ssl=$JMXSSL \
          -Dzookeeper.jmx.log4j.disable=$JMXLOG4J \
          org.apache.zookeeper.server.quorum.QuorumPeerMain"
    fi
    set -x
    exec java -cp "$CLASSPATH" $JVMFLAGS $MAIN $ZK_CONFIG_FILE

  vault-agent-config.hcl: |
    exit_after_auth = true
    pid_file = "/home/vault/pidfile"
    auto_auth {
        method "kubernetes" {
            mount_path = "auth/kubernetes_cce1"
            config = {
                role = "zookeeper"
                token_path = "/run/secrets/tokens/vault-token"
            }
        }
        sink "file" {
            config = {
                path = "/home/vault/.vault-token"
            }
        }
    }

    cache {
        use_auto_auth_token = true
    }

    # ZK is neat-picky on cert file extensions
    template {
      destination = "/tls/ca.pem"
      contents = <<EOT
    {{- with secret "secret/data/tls/ca" }}{{ .Data.data.certificate }}{{ end }}
    EOT
    }

    template {
      destination = "/tls/server/server.pem"
      contents = <<EOT
    {{- with secret "secret/data/tls/zk_server" }}{{ .Data.data.certificate }}
    {{ .Data.data.private_key }}{{ end }}
    EOT
    }
    template {
      destination = "/tls/server/tls.crt"
      contents = <<EOT
    {{- with secret "secret/data/tls/zk_server" }}{{ .Data.data.certificate }}{{ end }}
    EOT
    }
    template {
      destination = "/tls/server/tls.key"
      contents = <<EOT
    {{- with secret "secret/data/tls/zk_server" }}{{ .Data.data.private_key }}{{ end }}
    EOT
    }

    template {
      destination = "/tls/client/client.pem"
      contents = <<EOT
    {{- with secret "secret/data/tls/zk_client" }}{{ .Data.data.certificate }}
    {{ .Data.data.private_key }}{{ end }}
    EOT
    }
    template {
      destination = "/tls/client/tls.crt"
      contents = <<EOT
    {{- with secret "secret/data/tls/zk_client" }}{{ .Data.data.certificate }}{{ end }}
    EOT
    }
    template {
      destination = "/tls/client/tls.key"
      contents = <<EOT
    {{- with secret "secret/data/tls/zk_client" }}{{ .Data.data.private_key }}{{ end }}
    EOT
    }

kubectl apply -f zookeeper-cm.yaml

Create Zookeeper Headless service. It is used by pods to build quorum and implementing cluster internal communication.

---
name: "zookeeper-svc"
namespace: "zookeeper"
apiVersion: v1
kind: Service
spec:
  # Not exposing in the cluster
  clusterIP: None
  # Important to start up
  publishNotReadyAddresses: true
  selector:
    app: zookeeper
  ports:
  - port: 2281
    name: client
    targetPort: client
    protocol: TCP
  - port: 2888
    name: server
    targetPort: server
    protocol: TCP
  - port: 3888
    name: election
    targetPort: election
    protocol: TCP

kubectl apply -f zookeeper-svc.yaml

Create Frontend service. It is used by the clients and therefore only includes client port of Zookeeper.

apiVersion: v1
kind: Service
spec:
  clusterIP: None
  ports:
  - name: client
    port: 2281
    protocol: TCP
    targetPort: client
  selector:
    app: zookeeper
  sessionAffinity: None
  type: ClusterIP

kubectl apply -f zookeeper-svc-public.yaml

Create StatefulSet replacing [<VAULT_PUBLIC_ADDR>] with the address of the Vault server. This includes a pod with Vault Agent side container as an init container, Vault Agent side container used continuously in the run cycle of the pod and Zookeeper main container.

apiVersion: apps/v1
kind: StatefulSet
spec:
  podManagementPolicy: Parallel
  replicas: 3
  selector:
    matchLabels:
      app: zookeeper
      component: server
  serviceName: zookeeper-headless
  template:
    metadata:
      labels:
        app: zookeeper
        component: server
    spec:
      containers:

      - args:
        - agent
        - -config=/etc/vault/vault-agent-config.hcl
        - -log-level=debug
        - -exit-after-auth=false
        env:
        - name: VAULT_ADDR
          value: <VAULT_PUBLIC_ADDR>
        image: vault:1.9.0
        name: vault-agent-sidecar
        volumeMounts:
        - mountPath: /etc/vault
          name: vault-agent-config
        - mountPath: /tls
          name: cert-data
        - mountPath: /var/run/secrets/tokens
          name: k8-tokens

      - command:
        - /bin/bash
        - -xec
        - /config-scripts/run
        env:
        - name: ZK_REPLICAS
          value: "3"
        - name: ZOO_PORT
          value: "2181"
        - name: ZOO_STANDALONE_ENABLED
          value: "false"
        - name: ZOO_TICK_TIME
          value: "2000"
        image: zookeeper:3.7.0
        livenessProbe:
          exec:
            command:
            - sh
            - /config-scripts/ok
          failureThreshold: 2
          initialDelaySeconds: 20
          periodSeconds: 30
          successThreshold: 1
          timeoutSeconds: 5
        name: zookeeper
        ports:
        - containerPort: 2281
          name: client
          protocol: TCP
        - containerPort: 2888
          name: server
          protocol: TCP
        - containerPort: 3888
          name: election
          protocol: TCP
        readinessProbe:
          exec:
            command:
            - sh
            - /config-scripts/ready
          failureThreshold: 2
          initialDelaySeconds: 20
          periodSeconds: 30
          successThreshold: 1
          timeoutSeconds: 5
        securityContext:
          runAsUser: 1000
        volumeMounts:
        - mountPath: /data
          name: datadir
        - mountPath: /tls
          name: cert-data
        - mountPath: /config-scripts
          name: zookeeper-config
      dnsPolicy: ClusterFirst

      initContainers:
      - args:
        - agent
        - -config=/etc/vault/vault-agent-config.hcl
        - -log-level=debug
        - -exit-after-auth=true
        env:
        - name: VAULT_ADDR
          value: <VAULT_PUBLIC_ADDR>
        image: vault:1.9.0
        name: vault-agent
        volumeMounts:
        - mountPath: /etc/vault
          name: vault-agent-config
        - mountPath: /tls
          name: cert-data
        - mountPath: /var/run/secrets/tokens
          name: k8-tokens
      restartPolicy: Always
      serviceAccount: zookeeper
      serviceAccountName: zookeeper
      terminationGracePeriodSeconds: 1800
      volumes:
      - configMap:
          defaultMode: 420
          items:
          - key: vault-agent-config.hcl
            path: vault-agent-config.hcl
          name: zookeeper-config
        name: vault-agent-config
      - configMap:
          defaultMode: 365
          name: zookeeper-config
        name: zookeeper-config
      - emptyDir: {}
        name: cert-data
      - name: k8-tokens
        projected:
          defaultMode: 420
          sources:
          - serviceAccountToken:
              expirationSeconds: 7200
              path: vault-token

  updateStrategy:
    type: RollingUpdate
  volumeClaimTemplates:
  - apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      name: datadir
    spec:
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 5Gi
      storageClassName: csi-disk
      volumeMode: Filesystem

kubectl apply -f zookeeper-ss.yaml

With this a production-ready Zookeeper service with enabled TLS has been deployed successfully to the CCE. The Vault Agent takes care of authorizing to HashiCorp Vault using a Kubernetes service account with a short time to live token and fetches required secrets to the file system. In the entire Kubernetes deployment there are no secrets for the application, neither the key to the Vault, nor TLS certificates themselves. Not even using Kubernetes secrets is necessary.

Solution Design​

Establishing trust between CCE and Vault​

Access rules for Vault​

Populating secrets in Vault​

Deploying Zookeeper​

References​

Solution Design

Establishing trust between CCE and Vault

Access rules for Vault

Populating secrets in Vault

Deploying Zookeeper

References