Crit

Crit is a command-line tool for bootstrapping Kubernetes clusters. It handles the initial configuration of Kubernetes control plane components, and adding workers to the cluster.

It is designed to be used within automated scripting (i.e. non-interactive). Many providers of virtual infrastructure allow user-defined customization via shell script, which ensures Crit composes well with provider provisioning tools (e.g. AWS Cloudformation).

Installation

Pre-built binaries are available in Releases. Crit is written in Go so it is also pretty simple to install via go:

go get -u github.com/criticalstack/crit/cmd/crit

RPM/Debian packages are also available via packagecloud.io.

Requirements

Crit is a standalone binary, however, there are implied requirements that aren't as straight-forward. Be sure to check out the Getting Started.

Design

Decoupled from Etcd Management

The Kubernetes control plane requires etcd for storage, however, bootstrapping and managing etcd is not a responsibility of Crit. This decreases code complexity and results in more maintainable code. Rather than handle all aspects of installing and managing Kubernetes, Crit is designed to be one tool in the toolbox, specific to bootstrapping Kubernetes components.

Safely handling etcd in a cloud environment is not as easy as it may seem, so we have a separate project, e2d, designed to bootstrap etcd and manage cluster membership.

Lazy Cluster Initialization

Crit leverages the unique features of etcd to handle how brand new clusters are bootstrapped. With other tooling, this is often accomplished by handling cluster initialization separately from all subsequent nodes joining the cluster (even if done so implicitly). The complexity for handling this initial case can be difficult to automate in distributed systems. Instead, the distributed locking capabilities of etcd are used to synchronize nodes and initialize a cluster automatically. All nodes race to acquire the distributed lock, and should the cluster not exist (signified by the presence of shared cluster files), a new cluster is initialized by the node that was first to acquire the lock, otherwise the node joins the cluster.

Node Roles

Nodes are distinguished as having only one of two roles, either control plane or worker. All the same configurations for clusters are possible, such as colocating etcd on the control plane, but Crit is only concerned with how it needs to bootstrap the two basic node roles.

Cluster Upgrades

There are a several important considerations for upgrading a cluster. Crit itself is only a bootstrapper, in that it takes on the daunting task of ensuring that the cluster components are all configured, but afterwards, there is not much left for it to do. However, the most important aspects of the philosophy behind Crit and e2d is to ensure that colocated control planes can:

  1. Have all nodes deployed simultaneously, and crit/e2d will ensure that they are bootstrapped regardless of the order they come up.
  2. It can safely perform a rolling upgrade.

Built for Automation

Getting Started

Quick Start

A local Critical Stack cluster can be setup using cinder with one easy command:

$ cinder create cluster

This quickly creates a ready-to-use Kubernetes cluster running completely within a single Docker container.

Cinder, or Crit-in-Docker, can be useful for developing on Critial Stack clusters locally, or simply to learn more about Crit. You can read more about requirements, configuration, etc over in the Cinder Guide.

Running in Production

Setting up a production Kubernetes cluster requires quite a bit of planning and configuration. For one, there are many considerations that influence the way a cluster should be configured. When starting a new cluster or setting up a standard cluster configuration, one should consider the following:

  • Where will it be running? (e.g. AWS, GCP, bare-metal, etc)
  • What level of resiliency is required?
    • This is about how the cluster can deal with faults and depending upon factors like colocation of etcd, how it fails can become more complicated.
  • What will provide out-of-band storage for cluster secrets?
    • This applies mostly to the initial cluster secrets, the Kubernetes and Etcd CA cert/key pairs.
  • What kind of applications will run on the cluster?
  • What cost-based factors are there?
  • What discovery mechanisms are available for new nodes?
  • Are there specific performance requirements that affect the infrastructure being used?

The Crit Guide, and the accompanying Security Guide, exists to help answer these questions and provide general guidance for setting up a typical Kubernetes cluster to meet various use-cases.

In particular, a few good places to start planning your Kubernetes cluster:

Crit Guide

This guide will take you through all of the typical configuration use cases that may come up when creating a new Kubernetes cluster.

System Requirements

Exact system requirements will be dependent upon a lot of factors, however, for the most part, any relatively modern linux operating system will fit the bill.

  • Linux kernel >= 4.9.17
  • systemd
  • iptables (optional)

Newer versions of the kernel will enable using cilium's NodePort feature, which will replace the need to deploy kube-proxy (and therefore not need iptables also).

Dependencies

  • kubelet >= 1.14.x
  • containerd >= 1.2.6
  • CNI >= 0.7.5

References

  • https://kubernetes.io/docs/concepts/extend-kubernetes/compute-storage-net/network-plugins/#cni
  • https://kubernetes.io/docs/concepts/extend-kubernetes/compute-storage-net/network-plugins/#support-hostport
  • https://docs.cilium.io/en/v1.6/gettingstarted/cni-chaining-portmap/#portmap-hostport

Installation

Some of these will not work currently because some of the repos are still private. Also the helper script will need to be setup (probably CloudFlare Workers).

Install From Packagecloud.io

Debian/Ubuntu:

curl -sL https://packagecloud.io/criticalstack/public/gpgkey | apt-key add -
apt-add-repository https://packagecloud.io/criticalstack/public/ubuntu
apt-get install -y criticalstack-crit e2d

Fedora:

dnf config-manager --add-repo https://packagecloud.io/criticalstack/public/fedora
dnf install -y criticalstack-crit e2d

Install from GH releases

Download a binary release from https://github.com/criticalstack/crit/releases/latest suitable for your system and then install, for example:

curl -sLO https://github.com/criticalstack/crit/releases/download/v0.2.9/crit_0.2.9_Linux_x86_64.tar.gz
tar xzf crit_0.2.9_Linux_x86_64.tar.gz
mv crit /usr/local/bin/

Install from helper script

Run the following in your terminal to download the latest version of crit:

curl -sSf https://crit.sh/install/latest | sh

Configuration

Configuration is passed to Crit via yaml and is separated into two different types: ControlPlaneConfiguration and WorkerConfiguration. The concerns are split between these two configs for any given node, and they each contain a NodeConfiguration that specifies node-specific settings like for the [Kubelet], networking, etc.

Embedded ComponentConfigs

The ComponentConfigs are part of an ongoing effort to make configuration of Kubernetes components (API server, kubelet, etc) more dynamic by making configuration directly through Kubernetes API types. Crit will be using these ComponentConfigs when available since they simplify all aspects of taking user configuration and transforming that into Kubernetes component configuration. Kubernetes components are being changed to support direct configuration from file with the ComponentConfig API types, so Crit embeds these to make configuration more straightforward.

Currently, only the kube-proxy and kubelet ComponentConfigs are ready to be used, but more are currently being worked on and will be adopted by Crit as other components begin supporting configuration from file.

Runtime Defaults

Some configuration defaults are set at the time of running crit up. These mostly include settings that are based upon the host that is running the command, such as the hostname.

If left unset, the controlPlaneEndpoint value will be set to the ipv4 of the host. In the case there are multiple network interfaces, the first non-loopback network interface is used.

The default directory for Kubernetes files is /etc/kubernetes and any paths to manifests, certificates, etc are derived from this.

Etcd is also configured presuming that mTLS is used and that the etcd nodes are colocated with the Kubernetes control plane components, effectively making this the default configuration:

apiVersion: crit.sh/v1alpha2
kind: ControlPlaneConfiguration
etcd:
  endpoints:
  - "https://${controlPlaneEndpoint.Host}:2379"
  caFile: /etc/kubernetes/pki/etcd/ca.crt
  caKey: /etc/kubernetes/pki/etcd/ca.key
  certFile: /etc/kubernetes/pki/etcd/client.crt
  keyFile: /etc/kubernetes/pki/etcd/client.key

The CA certificate required for the worker to validate the cluster it's joining is also derived from the default Kubernetes configuration directory:

apiVersion: crit.sh/v1alpha2
kind: WorkerConfiguration
caCert: /etc/kubernetes/pki/ca.crt

Container Runtimes

If interested in a comprehensive deep-dive into all things container runtime, Capital One has a great blog post going into the history and current state of container runtimes: A Comprehensive Container Runtime Comparison.

Containerd

Containerd is a robust and easy-to-use container runtime. It has a proven track record of reliability, and is the container runtime we use for many Critical Stack installations.

Docker

Docker is a more than just a container runtime, and actually utilizes containerd internally.

CRI-O

Running Etcd

Crit requires a connection to etcd to coordinate the bootstrapping process. The etcd cluster does not have to be colocated on the node.

Control Plane Sizing

External Etcd

When dealing with etcd running external to the Kubernetes control plane components, there are not a lot of restrictions on how many control plane nodes one can have. There can be any number of nodes that meet demand and availability needs, and can even be auto-scaled. With that said, however, the performance of Kubernetes is tied heavily to the performance of etcd, so more nodes does not mean more performance.

Colocated Etcd

Colocation of etcd, or "stacked etcd" (as it's referred to in the Kubernetes documentation), is the practice of installing etcd alongside the Kubernetes control plane components (kube-apiserver, kube-controller-manager, and kube-scheduler). This has some obvious benefits like reducing cost by reducing the virtual machines needed, but introduces a lot of complexity and restrictions.

Etcd's performance goes down the more nodes that are added, because more members are required to vote to commit to the raft log, so there should never be more than 5 voting members in a cluster (unless performing a rolling upgrade). Also, the number of members should always be odd to help protect against the split-brain problem. This means that the control plane can only safely be made up of 1, 3, or 5 nodes.

Etcd also should not be scaled up or down (at least, at this time). The reason is that the etcd cluster is being put at risk each time there is a membership change, so this also means that the control plane size needs to be selected ahead of time and not be altered.

General Recommendations

In cloud environments, 3 is a good size to balance resiliency and performance. The reasoning here is that cloud environments provide ways to quickly automate replacing failed members, so losing a node does not put etcd in danger of losing quorum for long until a new node can replace the existing one. As etcd moves towards adding more functionality around the learners member type, this will also open up to having a "hot spare" ready to take the place of the failed member immediately.

For bare-metal, 5 is a good size to ensure that failed nodes have more time to be replaced since a new node might need to be physically allocated.

Generating Certificates

Creating a new set of cluster CAs, k8s and etcd

Should be stored out-of-band and made available through secrets management of some kind

Bootstrapping a Worker

There are two available options for bootstrapping new worker nodes:

Bootstrap Token

Crit supports a worker bootstrap flow using bootstrap tokens and the cluster CA certificate (e.g. /etc/kubernetes/pki/ca.crt):

apiVersion: crit.sh/v1alpha2
kind: WorkerConfiguration
bootstrapToken: abcdef.0123456789abcdef
caCert: /etc/kubernetes/pki/ca.crt
controlPlaneEndpoint: mycluster.domain
node:
  cloudProvider: aws
  kubernetesVersion: 1.17.3

This method is adapted from the kubeadm join workflow, but uses the full CA certificate instead of using CA pinning. It also does not depend upon clients getting a signed configmap, and therefore does not require anonymous auth to be turned on.

Bootstrap Server

Experimental

The bootstrap protocol used by Kubernetes/kubeadm relies on operations that imply manual work to be performed, in particular, the bootstrap token creation and how that is distributed to new worker nodes. Crit introduces a new bootstrap protocol that tries to work better in environments that are completely automated.

A bootstrap-server static pod is created alongside the Kubernetes components that run on each control plane node. This provides a service to new nodes before they have joined the cluster that allows them to be authorized and given a bootstrap token. This also has the benefit of making the bootstrap token expiration very small, limited the window greatly that it can be used.

Configuration

Here is an example of using Amazon Instance Identity Document w/ signature verification while also limiting the accounts bootstrap tokens will be issued for:

apiVersion: crit.sh/v1alpha2
kind: ControlPlaneConfiguration
critBootstrapServer:
  cloudProvider: aws
  extraArgs:
    filters: account-id=${account_id}

Override bootstrap-server default port:

apiVersion: crit.sh/v1alpha2
kind: ControlPlaneConfiguration
critBootstrapServer:
  extraArgs:
    port: 8080

Authorizers

AWS

The AWS authorizer uses Instance Identity Documents and RSA SHA 256 signature verification to confirm the identity of new nodes requesting bootstrap tokens.

Configuring Control Plane Components

Control plane endpoint

apiVersion: crit.sh/v1alpha2
kind: ControlPlaneConfiguration
controlPlaneEndpoint: example.com:6443

Disable/Enable Kubernetes Feature Gates

Setting feature gates will be important if you need specific features that are not available by default or maybe to enable a feature that wasn't enabled by default for a particular version of Kubernetes. For example, CSI-related features were only enabled by default starting with version 1.17, so for older versions of Kubernetes you will need to turn them on manually for the control plane:

apiVersion: crit.sh/v1alpha2
kind: ControlPlaneConfiguration
...
kubeAPIServer:
  featureGates:
    CSINodeInfo: true
    CSIDriverRegistry: true
    CSIBlockVolume: true
    VolumeSnapshotDataSource: true
node:
  kubelet:
    featureGates:
      CSINodeInfo: true
      CSIDriverRegistry: true
      CSIBlockVolume: true

and for the workers:

apiVersion: crit.sh/v1alpha2
kind: WorkerConfiguration
...
node:
  kubelet:
    featureGates:
      CSINodeInfo: true
      CSIDriverRegistry: true
      CSIBlockVolume: true

Installing a CNI

Install cilium via helm

helm repo add criticalstack https://charts.cscr.io/criticalstack
helm install cilium criticalstack/cilium --namespace kube-system \
	--version 1.7.1

Installing a Storage Driver

helm repo add criticalstack https://charts.cscr.io/criticalstack
kubectl create namespace local-path-storage
helm install local-path-storage criticalstack/local-path-provisioner \
	--namespace local-path-storage \
	--set nameOverride=local-path-storage \
	--set storageClass.defaultClass=true

Install the AWS CSI driver via helm

https://github.com/kubernetes-sigs/aws-ebs-csi-driver

helm repo add criticalstack https://charts.cscr.io/criticalstack
helm install aws-ebs-csi-driver criticalstack/aws-ebs-csi-driver \
	--set enableVolumeScheduling=true \
	--set enableVolumeResizing=true \
	--set enableVolumeSnapshot=true \
	--version 0.3.0

Setting a Default StorageClass

kubectl apply -f - <<EOT
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: ebs-sc
  annotations:
    storageclass.kubernetes.io/is-default-class: "true"
provisioner: ebs.csi.aws.com
volumeBindingMode: WaitForFirstConsumer
parameters:
  csi.storage.k8s.io/fstype: xfs
  type: io1
  iopsPerGB: "50"
  encrypted: "true"
EOT

Configuring Authentication

Configure the Kubernetes API Server

The Kubernetes API server can be configured to trust a single The Kubernetes API server can be configured with OpenID Connect to use an existing OpenID Identity Provider. It can only trust a single issuer and until the API server can be configured with component configs it must be specified in the Crit config as command-line arguments:

apiVersion: crit.sh/v1alpha2
kind: ControlPlaneConfiguration
kubeAPIServer:
  extraArgs:
    oidc-issuer-url: "https://accounts.google.com"
    oidc-client-id: critical-stack
    oidc-username-claim: email
    oidc-groups-claim: groups

The above configuration will allow the API server to use Google as its identity provider, but with some major limitations:

  • Kubernetes does not act as a client for the issuer
  • Does not provide a way to manage the lifecycle of OpenID Connect tokens

This can be best understood looking in the Kubernetes authentication documentation for OpenID Connect Tokens. The process of getting a token happens completely outside of the context of the Kubernetes cluster and is passed in as an argument to kubectl commands.

Using an In-cluster Identity Provider

Given the limitations mentioned above, many run their own identity providers inside of the cluster to provide additional auth features to the cluster. This complicates configuration, however, since the API server will either have to be reconfigured and restarted, or will need to be configured with an issuer that is not yet running.

So what if you want to provide a web interface that leverages this authentication? Given the limitations mentioned above, you would have to write authentication logic for the specific upstream identity provider into your application, and should the upstream identity provider change, so does the authentication logic AND the API server configuration. This is where identity providers, such as Dex, come in. Dex uses OpenID Connect to provide authentication for other applications by acting as a shim between the client app and the upstream provider. When using Dex, the oidc-issuer-url argument being specified needs to target the expected address of Dex running the cluster, so something like:

oidc-issuer-url: "https://dex.kube-system.svc.cluster.local:5556"

It is ok that Dex isn't running yet, the API server will function as normal until the issuer is available.

The auth-proxy CA

The API server uses the host's root CAs by default, but in the case where an application might not be using a CA signed certificate, like during development or automated testing, Crit generates an additional CA that is already available in the API server certs volume. This helps with the chicken/egg problem of needing to specify a CA file when bootstrapping a new cluster before the application has been deployed. To use this auth-proxy CA, just add this to the API server configuration:

oidc-ca-file: /etc/kubernetes/pki/auth-proxy-ca.crt

Please note that this assumes that the default Kubernetes directory (/etc/kubernetes) is being used. From here there are many options to make use of auth-proxy CA. For example, cert-manager can be installed and the auth-proxy CA can be setup as a ClusterIssuer:

# install cert-manager
kubectl create namespace cert-manager
kubectl apply --validate=false -f https://github.com/jetstack/cert-manager/releases/download/v0.14.0/cert-manager.yaml

# add auth-proxy-ca secret to be used as ClusterIssuer
kubectl -n cert-manager create secret generic auth-proxy-ca --from-file=tls.crt=/etc/kubernetes/pki/auth-proxy-ca.crt --from-file=tls.key=/etc/kubernetes/pki/auth-proxy-ca.key

# wait for cert-manager-webhook readiness
while [[ $(kubectl -n cert-manager get pods -l app=webhook -o 'jsonpath={..status.conditions[?(@.type=="Ready")].status}') != "True" ]]; do echo "waiting for pod" && sleep 1; done

kubectl apply -f - <<EOT
apiVersion: cert-manager.io/v1alpha2
kind: ClusterIssuer
metadata:
  name: auth-proxy-ca
  namespace: cert-manager
spec:
  ca:
    secretName: auth-proxy-ca
EOT

Then applications can create cert-manager certificates for their application to use:

apiVersion: cert-manager.io/v1alpha2
kind: Certificate
metadata:
  name: myapp-example
spec:
  secretName: myapp-certs
  duration: 8760h # 365d
  renewBefore: 360h # 15d
  organization:
  -  Internet Widgits Pty Ltd
  isCA: false
  keySize: 2048
  keyAlgorithm: rsa
  keyEncoding: pkcs1
  usages:
    - server auth
    - client auth
  dnsNames:
  - myapp.example.com
  issuerRef:
    name: auth-proxy-ca
    kind: ClusterIssuer

Of course, this is just one possible way to approach authentication, and configuration will vary greatly depending upon the needs of the application(s) running on the cluster.

Kubelet Settings

Disable swap for Linux-based Operating Systems

Swap cannot be enabled for the kubelet to work (see here). This is a helpful drop-in to ensure that swap is disabled on a system:

[Unit]
After=local-fs.target

[Service]
ExecStart=/sbin/swapoff -a

[Install]
WantedBy=multi-user.target

Reserving Resources

Reserving some resources for the system to use is often times very helpful to ensure that resource hungry pods don't kill the system by causing it to run out of memory.

...
node:
  kubelet:
    kubeReserved:
      cpu: 128m
      memory: 64Mi
    kubeReservedCgroup: /podruntime.slice
    kubeletCgroups: /podruntime.slice
    systemReserved:
      cpu: 128m
      memory: 192Mi
    systemReservedCgroup: /system.slice
# /etc/systemd/system/kubelet.service.d/10-cgroup.conf
# Sets the cgroup for the kubelet service
[Service]
CPUAccounting=true
MemoryAccounting=true
Slice=podruntime.slice
# /etc/systemd/system/containers.slice
# Creates a cgroup for kubelet
[Unit]
Description=Grouping resources slice for containers
Documentation=man:systemd.special(7)
DefaultDependencies=no
Before=slices.target
Requires=-.slice
After=-.slice
# /etc/systemd/system/podruntime.slice
# Creates a cgroup for kubelet
[Unit]
Description=Limited resources slice for Kubelet service
Documentation=man:systemd.special(7)
DefaultDependencies=no
Before=slices.target
Requires=-.slice
After=-.slice

Exposing Cluster DNS

Replace Systemd-resolved With Dnsmasq

Sometimes systemd-resolved, default stub resolver for many linux systems, needs to be replaced with dnsmasq. This dnsmasq systemd drop-in is useful to ensure that systemd-resolved is not running when the dnsmasq service is started:

# /etc/systemd/system/dnsmasq.service.d/10-resolved-fix.conf
[Unit]
After=systemd-resolved.service

[Service]
ExecStartPre=/bin/systemctl stop systemd-resolved.service
ExecStartPost=/bin/systemctl start systemd-resolved.service
# /etc/dnsmasq.d/kube.conf
server=/cluster.local/10.254.0.10

Security Guide

This guide will take you through configuring security features of Kubernetes, as well as, features specific to Crit. It will also include general helpful information or gotchas to look out for when creating a new cluster.

Encrypting Kubernetes Secrets

EncryptionProviderConfig

To encrypt secrets within the cluster you must create an EncryptionConfiguration manifest and pass it to the API server.

touch /etc/kubernetes/encryption-config.yaml
chmod 600 /etc/kubernetes/encryption-config.yaml
cat <<-EOT > /etc/kubernetes/encryption-config.yaml
apiVersion: apiserver.config.k8s.io/v1
kind: EncryptionConfiguration
resources:
  - resources:
    - secrets
    providers:
    - aescbc:
        keys:
        - name: key1
          secret: $(cat /etc/kubernetes/pki/etcd/ca.key | md5sum | cut -f 1 -d ' ' | head -c -1 | base64)
    - identity: {}
EOT

This EncryptionConfiguration uses the aescbc provider for encrypting secrets. Details on other providers, including third-party key management systems, can be found in the Kubernetes official documentation.

apiVersion: crit.sh/v1alpha2
kind: ControlPlaneConfiguration
kubeAPIServer:
  extraVolumes:
  - name: encryption-config
    hostPath: /etc/kubernetes/encryption-config.yaml
    mountPath: /etc/kubernetes/encryption-config.yaml
    readOnly: true

Once the API server is available, verify that new secrets are encrypted.

Enabling Pod Security Policies

What is a Pod Security Policy

Pod Security Policies are in-cluster Kubernetes resources that provides ways of securing pods. The official Pod Security Policy of the official Kubernetes docs provides a great deal of helpful information and a walkthrough of how to use them, and is highly recommended reading. For the purposes of this documentation, we really just want to focus on getting them running on your Crit cluster.

Configuration

The APIServer has quite a few admission plugins enabled by default, however, the PodSecurityPolicy plugin must be enabled when configuring the APIServer with the enable-admission-plugin option:

apiVersion: crit.sh/v1alpha2
kind: ControlPlaneConfiguration
kubeAPIServer:
  extraArgs:
    enable-admission-plugins: PodSecurityPolicy

enable-admission-plugin can be provided a comma-delimited list of admission plugins to enable. While the order that admission plugins run does matter, it does not matter for this particular option as it simply enables the plugin.

The admission plugin SecurityContextDeny must NOT be enabled along with PodSecurityPolicy. In the case that PodSecurityPolicy is enabled, the usage completely supplants the functionality provided by SecurityContextDeny.

Pod Security Policy Examples

Crit embeds two Pod Security Policies that provides a good starting place for configuring PSPs in your cluster. They were adapted from the examples provided in the Kubernetes docs and can be found in GitHub here or can be printed to the console using crit template on the desired file:

$ crit template psp-privileged.yaml

Privileged Pod Security Policy

# psp-privileged.yaml
apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
  name: privileged
  annotations:
    seccomp.security.alpha.kubernetes.io/allowedProfileNames: '*'
spec:
  privileged: true
  allowPrivilegeEscalation: true
  allowedCapabilities:
  - '*'
  volumes:
  - '*'
  hostNetwork: true
  hostPorts:
  - min: 0
    max: 65535
  hostIPC: true
  hostPID: true
  runAsUser:
    rule: 'RunAsAny'
  seLinux:
    rule: 'RunAsAny'
  supplementalGroups:
    rule: 'RunAsAny'
  fsGroup:
    rule: 'RunAsAny'
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: psp:privileged
rules:
- apiGroups: ['policy']
  resources: ['podsecuritypolicies']
  verbs:     ['use']
  resourceNames:
  - privileged
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: psp:privileged
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: psp:privileged
subjects:
- kind: Group
  apiGroup: rbac.authorization.k8s.io
  name: system:serviceaccounts:kube-system
- kind: Group
  name: system:serviceaccounts:kube-node-lease
  apiGroup: rbac.authorization.k8s.io
- kind: Group
  name: system:serviceaccounts:kube-public
  apiGroup: rbac.authorization.k8s.io
- kind: Group
  name: system:serviceaccounts:default
  apiGroup: rbac.authorization.k8s.io
- kind: Group
  name: system:nodes
  apiGroup: rbac.authorization.k8s.io
- kind: User
  apiGroup: rbac.authorization.k8s.io
  # Legacy node ID
  name: kubelet

Restricted Pod Security Policy

# psp-restricted.yaml
apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
  name: default-cluster-restricted
  annotations:
    seccomp.security.alpha.kubernetes.io/allowedProfileNames: 'docker/default,runtime/default'
    apparmor.security.beta.kubernetes.io/allowedProfileNames: 'runtime/default'
    seccomp.security.alpha.kubernetes.io/defaultProfileName:  'runtime/default'
    apparmor.security.beta.kubernetes.io/defaultProfileName:  'runtime/default'
spec:
  privileged: false
  # Required to prevent escalations to root.
  allowPrivilegeEscalation: false
  # This is redundant with non-root + disallow privilege escalation,
  # but we can provide it for defense in depth.
  requiredDropCapabilities:
    - ALL
  # Allow core volume types.
  volumes:
    - 'configMap'
    - 'emptyDir'
    - 'projected'
    - 'secret'
    - 'downwardAPI'
    # Assume that persistentVolumes set up by the cluster admin are safe to use.
    - 'persistentVolumeClaim'
  hostNetwork: false
  hostIPC: false
  hostPID: false
  runAsUser:
    # Require the container to run without root privileges.
    rule: 'MustRunAsNonRoot'
  seLinux:
    # This policy assumes the nodes are using AppArmor rather than SELinux.
    rule: 'RunAsAny'
  supplementalGroups:
    rule: 'MustRunAs'
    ranges:
      # Forbid adding the root group.
      - min: 1
        max: 65535
  fsGroup:
    rule: 'MustRunAs'
    ranges:
      # Forbid adding the root group.
      - min: 1
        max: 65535
  readOnlyRootFilesystem: false
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: psp:restricted
rules:
- apiGroups: ['policy']
  resources: ['podsecuritypolicies']
  verbs:     ['use']
  resourceNames:
  - default-cluster-restricted
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: psp:restricted
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: psp:restricted
subjects:
# Authorize all service accounts in a namespace:
- kind: Group
  apiGroup: rbac.authorization.k8s.io
  name: system:serviceaccounts
# Or equivalently, all authenticated users in a namespace:
- kind: Group
  apiGroup: rbac.authorization.k8s.io
  name: system:authenticated

Audit Policy Logging

apiVersion: crit.sh/v1alpha2
kind: ControlPlaneConfiguration
kubeAPIServer:
  extraArgs:
    audit-policy-file: "/etc/kubernetes/audit-policy.yaml"
    audit-log-path: "/var/log/kubernetes/kube-apiserver-audit.log"
    audit-log-maxage: "30"
    audit-log-maxbackup: "10"
    audit-log-maxsize: "100"
  extraVolumes:
  - name: apiserver-logs
    hostPath: /var/log/kubernetes
    mountPath: /var/log/kubernetes
    readOnly: false
    hostPathType: directory
  - name: apiserver-audit-config
    hostPath: /etc/kubernetes/audit-policy.yaml
    mountPath: /etc/kubernetes/audit-policy.yaml
    readOnly: true

Disabling Anonymous Authentication

apiVersion: crit.sh/v1alpha2
kind: ControlPlaneConfiguration
kubeAPIServer:
  extraArgs:
    anonymous-auth: false

Kubelet Server Certificate

Encrypting Shared Cluster Files

Cinder Guide

This guide will take you through installing and using Cinder.

What is Cinder

Cinder, or Crit-in-Docker, is very similar to kind. In fact, it uses many packages from kind under-the-hood along with the base container image that makes it all work. Think of cinder as like a flavor of kind (kind is quite good, to say the least). Just like kind, cinder won't work on all platforms, and right now only supports amd64 architectures running macOS and linux, and requires running Docker.

Cinder bootstraps each node with Crit and installs several helpful additional components, such as the machine-api and machine-api-provider-docker.

Installation

go get -u github.com/criticalstack/crit/cmd/cinder

Configuration

We've started working on a cinder-specific configuration, but it is currently up-in-the-air what that will look like, so we would love feedback on what features users would like to see here.

Adding Files

apiVersion: cinder.crit.sh/v1alpha1
kind: ClusterConfiguration
files:
  - path: "/etc/kubernetes/auth-proxy-ca.yaml"
    owner: "root:root"
    permissions: "0644"
    content: |
      apiVersion: cert-manager.io/v1alpha2
      kind: ClusterIssuer
      metadata:
        name: auth-proxy-ca
        namespace: cert-manager
      spec:
        ca:
          secretName: auth-proxy-ca

HostPath

apiVersion: cinder.crit.sh/v1alpha1
kind: ClusterConfiguration
  kubeAPIServer:
    extraArgs:
      audit-policy-file: "/etc/kubernetes/audit-policy.yaml"
      audit-log-path: "/var/log/kubernetes/kube-apiserver-audit.log"
      audit-log-maxage: "30"
      audit-log-maxbackup: "10"
      audit-log-maxsize: "100"
    extraVolumes:
    - name: apiserver-logs
      hostPath: /var/log/kubernetes
      mountPath: /var/log/kubernetes
      readOnly: false
      hostPathType: Directory
    - name: apiserver-audit-config
      hostPath: /etc/kubernetes/audit-policy.yaml
      mountPath: /etc/kubernetes/audit-policy.yaml
      readOnly: true
files:
  - path: "/etc/kubernetes/audit-policy.yaml"
    owner: "root:root"
    permissions: "0644"
    encoding: hostpath
    content: audit-policy.yaml

Running Additional Commands

apiVersion: cinder.crit.sh/v1alpha1
kind: ClusterConfiguration
preCritCommands:
  - crit version
postCritCommands:
  - |
    helm repo add jetstack https://charts.jetstack.io
    helm install cert-manager jetstack/cert-manager \
      --namespace cert-manager \
      --version v0.15.1 \
      --set tolerations[0].effect=NoSchedule \
      --set tolerations[0].key="node.kubernetes.io/not-ready" \
      --set tolerations[0].operator=Exists \
      --set installCRDs=true
    kubectl rollout status -n cert-manager deployment/cert-manager-webhook -w && sleep 1

Updating the Containerd Configuration

apiVersion: cinder.crit.sh/v1alpha1
kind: ClusterConfiguration
files:
  - path: "/etc/containerd/config.toml"
    owner: "root:root"
    permissions: "0644"
    content: |
      # explicitly use v2 config format
      version = 2

      # set default runtime handler to v2, which has a per-pod shim
      [plugins."io.containerd.grpc.v1.cri".containerd]
        default_runtime_name = "runc"
      [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
        runtime_type = "io.containerd.runc.v2"

      # Setup a runtime with the magic name ("test-handler") used for Kubernetes
      # runtime class tests ...
      [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.test-handler]
        runtime_type = "io.containerd.runc.v2"

      [plugins."io.containerd.grpc.v1.cri".registry.mirrors]
        [plugins."io.containerd.grpc.v1.cri".registry.mirrors."docker.io"]
          endpoint = ["https://docker.io"]

Crit Development

apiVersion: cinder.crit.sh/v1alpha1
kind: ClusterConfiguration
files:
  - path: "/usr/bin/crit"
    owner: "root:root"
    permissions: "0755"
    encoding: hostpath
    content: bin/crit

Local Registry

apiVersion: cinder.crit.sh/v1alpha1
kind: ClusterConfiguration
featureGates:
  LocalRegistry: true

const LocalRegistryHostingConfigMap = apiVersion: v1 kind: ConfigMap metadata: name: local-registry-hosting namespace: kube-public data: localRegistryHosting.v1: | host: "localhost:{{ .LocalRegistryPort }}" hostFromContainerRuntime: "{{ .LocalRegistryName }}:{{ .LocalRegistryPort }}" hostFromClusterNetwork: "{{ .LocalRegistryName }}:{{ .LocalRegistryPort }}" help: "https://docs.crit.sh/cinder-guide/local-registry.html"

https://github.com/kubernetes/enhancements/pull/1757

https://github.com/kubernetes/enhancements/blob/c5b6b632811c21ababa9e3565766b2d70614feec/keps/sig-cluster-lifecycle/generic/1755-communicating-a-local-registry/README.md#design-details

Registry Mirrors

apiVersion: cinder.crit.sh/v1alpha1
kind: ClusterConfiguration
registryMirrors:
  docker.io: "https://docker.io"

Krustlet

apiVersion: cinder.crit.sh/v1alpha1
kind: ClusterConfiguration
featureGates:
  Krustlet: true
controlPlaneConfiguration:
  kubeProxy:
    affinity:
      nodeAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
          nodeSelectorTerms:
            - matchExpressions:
              - key: "kubernetes.io/arch"
                operator: NotIn
                values: ["wasm32-wasi", "wasm32-wascc"]

https://github.com/deislabs/krustlet

FAQ

TODO: modify this a bit

Can e2d scale up (or down) after cluster initialization?

The short answer is No, because it is unsafe to scale etcd and any solution that scales etcd is increasing the chance of cluster failure. This is a feature that will be supported in the future, but it relies on new features and fixes to etcd. Some context will be necessary to explain why:

A common misconception about etcd is that it is scalable. While etcd is a distributed key/value store, the reason it is distributed is to provide for distributed consensus, NOT to scale in/out for performance (or flexibility). In fact, the best performing etcd cluster is when it only has 1 member and the performance goes down as more members are added. In etcd v3.4, a new type of member called learners was introduced. These are members that can receive raft log updates, but are not part of the quorum voting process. This will be an important feature for many reasons, like stability/safety and faster recovery from faults, but will also potentially[1] enable etcd clusters of arbitrary sizes.

So why not scale within the recommended cluster sizes if the only concern is performance? Previously, etcd clusters have been vulnerable to corruption during membership changes due to the way etcd implemented raft. This has only recently been addressed by incredible work from CockroachDB, and it is worth reading about the issue and the solution in this blog post: Availability and Region Failure: Joint Consensus in CockroachDB.

The last couple features needed to safely scale have been roadmapped for v3.5 and are highlighted in the etcd learner design doc:

Make learner state only and default: Defaulting a new member state to learner will greatly improve membership reconfiguration safety, because learner does not change the size of quorum. Misconfiguration will always be reversible without losing the quorum.

Make voting-member promotion fully automatic: Once a learner catches up to leader’s logs, a cluster can automatically promote the learner. etcd requires certain thresholds to be defined by the user, and once the requirements are satisfied, learner promotes itself to a voting member. From a user’s perspective, “member add” command would work the same way as today but with greater safety provided by learner feature.

Since we want to implement this feature as safely and reliably as possible, we are waiting for this confluence of features to become stable before finally implementing scaling into e2d.

[1] Only potentially, because the maximum is currently set to allow only 1 learner. There is a concern that too many learners could have a negative impact on the leader which is discussed briefly here. It is also worth noting that other features may also fulfill the same need like some kind of follower replication: etcd#11357.

Command Reference

crit

bootstrap Critical Stack clusters

Synopsis

bootstrap Critical Stack clusters

Options

  -h, --help            help for crit
  -v, --verbose count   log output verbosity

SEE ALSO

General Commands

crit template

Render embedded assets

Synopsis

Render embedded assets

crit template [path] [flags]

Options

  -c, --config string   config file (default "config.yaml")
  -h, --help            help for template

Options inherited from parent commands

  -v, --verbose count   log output verbosity

SEE ALSO

  • crit - bootstrap Critical Stack clusters

crit up

Bootstraps a new node

Synopsis

Bootstraps a new node

crit up [flags]

Options

  -c, --config string              config file (default "config.yaml")
  -h, --help                       help for up
      --kubelet-timeout duration   timeout for Kubelet to become healthy (default 15s)
      --timeout duration            (default 20m0s)

Options inherited from parent commands

  -v, --verbose count   log output verbosity

SEE ALSO

  • crit - bootstrap Critical Stack clusters

crit version

Print the version info

Synopsis

Print the version info

crit version [flags]

Options

  -h, --help   help for version

Options inherited from parent commands

  -v, --verbose count   log output verbosity

SEE ALSO

  • crit - bootstrap Critical Stack clusters

crit certs

Handle Kubernetes certificates

Synopsis

Handle Kubernetes certificates

Options

  -h, --help   help for certs

Options inherited from parent commands

  -v, --verbose count   log output verbosity

SEE ALSO

crit certs init

initialize a new CA

Synopsis

initialize a new CA

crit certs init [flags]

Options

      --cert-dir string   
  -h, --help              help for init

Options inherited from parent commands

  -v, --verbose count   log output verbosity

SEE ALSO

crit certs list

list cluster certificates

Synopsis

list cluster certificates

crit certs list [flags]

Options

      --cert-dir string    (default "/etc/kubernetes/pki")
  -h, --help              help for list

Options inherited from parent commands

  -v, --verbose count   log output verbosity

SEE ALSO

crit certs renew

renew cluster certificates

Synopsis

renew cluster certificates

crit certs renew [flags]

Options

      --cert-dir string    (default "/etc/kubernetes/pki")
      --dry-run           
  -h, --help              help for renew

Options inherited from parent commands

  -v, --verbose count   log output verbosity

SEE ALSO

crit config

Handle Kubernetes config files

Synopsis

Handle Kubernetes config files

Options

  -h, --help   help for config

Options inherited from parent commands

  -v, --verbose count   log output verbosity

SEE ALSO

crit config import

import a kubeconfig

Synopsis

import a kubeconfig

crit config import [kubeconfig] [flags]

Options

  -h, --help   help for import

Options inherited from parent commands

  -v, --verbose count   log output verbosity

SEE ALSO

crit create

Create Kubernetes resources

Synopsis

Create Kubernetes resources

Options

  -h, --help   help for create

Options inherited from parent commands

  -v, --verbose count   log output verbosity

SEE ALSO

crit create token

creates a bootstrap token resource

Synopsis

creates a bootstrap token resource

crit create token [token] [flags]

Options

  -h, --help   help for token

Options inherited from parent commands

  -v, --verbose count   log output verbosity

SEE ALSO

crit generate

Utilities for generating values

Synopsis

Utilities for generating values

Options

  -h, --help   help for generate

Options inherited from parent commands

  -v, --verbose count   log output verbosity

SEE ALSO

crit generate hash

Synopsis

crit generate hash [ca-cert-path] [flags]

Options

  -h, --help   help for hash

Options inherited from parent commands

  -v, --verbose count   log output verbosity

SEE ALSO

crit generate kubeconfig

generates a kubeconfig

Synopsis

generates a kubeconfig

crit generate kubeconfig [filename] [flags]

Options

      --CN string           (default "kubernetes-admin")
      --O string            (default "system:masters")
      --cert-dir string     (default ".")
      --cert-name string    (default "ca")
  -h, --help               help for kubeconfig
      --name string         (default "crit")
      --server string      

Options inherited from parent commands

  -v, --verbose count   log output verbosity

SEE ALSO

crit generate token

generates a bootstrap token

Synopsis

generates a bootstrap token

crit generate token [token] [flags]

Options

  -h, --help   help for token

Options inherited from parent commands

  -v, --verbose count   log output verbosity

SEE ALSO

cinder

Create local Kubernetes clusters

Synopsis

Cinder is a tool for creating and managing local Kubernetes clusters using containers as nodes. It builds upon kind, but using Crit and Cilium to configure a Critical Stack cluster locally.

Options

  -h, --help            help for cinder
  -v, --verbose count   log output verbosity

SEE ALSO

General Commands

cinder load

Load container images from host

Synopsis

Load container images from host

cinder load [flags]

Options

  -h, --help          help for load
      --name string   cluster name (default "cinder")

Options inherited from parent commands

  -v, --verbose count   log output verbosity

SEE ALSO

  • cinder - Create local Kubernetes clusters

cinder version

Print the version info

Synopsis

Print the version info

cinder version [flags]

Options

  -h, --help   help for version

Options inherited from parent commands

  -v, --verbose count   log output verbosity

SEE ALSO

  • cinder - Create local Kubernetes clusters

cinder create

Create cinder resources

Synopsis

Create cinder resources such a new cinder clusters or add nodes to existing clusters.

cinder create [flags]

Options

  -h, --help   help for create

Options inherited from parent commands

  -v, --verbose count   log output verbosity

SEE ALSO

cinder create cluster

Creates a new cinder cluster

Synopsis

Creates a new cinder cluster

cinder create cluster [flags]

Options

  -c, --config string       cinder configuration file
  -h, --help                help for cluster
      --image string        node image (default "criticalstack/cinder:v1.0.0-beta.1")
      --kubeconfig string   sets kubeconfig path instead of $KUBECONFIG or $HOME/.kube/config
      --name string         cluster name (default "cinder")

Options inherited from parent commands

  -v, --verbose count   log output verbosity

SEE ALSO

cinder create node

Creates a new cinder worker

Synopsis

Creates a new cinder worker

cinder create node [flags]

Options

  -c, --config string       cinder configuration file
  -h, --help                help for node
      --image string        node image (default "criticalstack/cinder:v1.0.0-beta.1")
      --kubeconfig string   sets kubeconfig path instead of $KUBECONFIG or $HOME/.kube/config
      --name string         cluster name (default "cinder")

Options inherited from parent commands

  -v, --verbose count   log output verbosity

SEE ALSO

cinder delete

Delete cinder resources

Synopsis

Delete cinder resources

cinder delete [flags]

Options

  -h, --help   help for delete

Options inherited from parent commands

  -v, --verbose count   log output verbosity

SEE ALSO

cinder delete cluster

Deletes a cinder cluster

Synopsis

Deletes a cinder cluster

cinder delete cluster [flags]

Options

  -h, --help          help for cluster
      --name string   cluster name (default "cinder")

Options inherited from parent commands

  -v, --verbose count   log output verbosity

SEE ALSO

cinder delete node

Deletes a cinder node

Synopsis

Deletes a cinder node

cinder delete node [flags]

Options

  -h, --help          help for node
      --name string   cluster name (default "cinder")

Options inherited from parent commands

  -v, --verbose count   log output verbosity

SEE ALSO

cinder export

Export from local cluster

Synopsis

Export from local cluster

cinder export [flags]

Options

  -h, --help   help for export

Options inherited from parent commands

  -v, --verbose count   log output verbosity

SEE ALSO

cinder export kubeconfig

Export kubeconfig from cinder cluster and merge with $HOME/.kube/config

Synopsis

Export kubeconfig from cinder cluster and merge with $HOME/.kube/config

cinder export kubeconfig [flags]

Options

  -h, --help                help for kubeconfig
      --kubeconfig string   sets kubeconfig path instead of $KUBECONFIG or $HOME/.kube/config
      --name string         cluster name (default "cinder")

Options inherited from parent commands

  -v, --verbose count   log output verbosity

SEE ALSO

cinder get

Get cinder resources

Synopsis

Get cinder resources

cinder get [flags]

Options

  -h, --help   help for get

Options inherited from parent commands

  -v, --verbose count   log output verbosity

SEE ALSO

cinder get clusters

Get running cluster

Synopsis

Get running cluster

cinder get clusters [flags]

Options

  -h, --help                help for clusters
      --kubeconfig string   sets kubeconfig path instead of $KUBECONFIG or $HOME/.kube/config
      --name string         cluster name (default "cinder")

Options inherited from parent commands

  -v, --verbose count   log output verbosity

SEE ALSO

cinder get images

List all containers images used by cinder

Synopsis

List all containers images used by cinder

cinder get images [flags]

Options

  -h, --help   help for images

Options inherited from parent commands

  -v, --verbose count   log output verbosity

SEE ALSO

cinder get ip

Get node IP address

Synopsis

Get node IP address

cinder get ip [flags]

Options

  -h, --help          help for ip
      --name string   cluster name (default "cinder")

Options inherited from parent commands

  -v, --verbose count   log output verbosity

SEE ALSO

cinder get kubeconfigs

Get kubeconfig from cinder cluster

Synopsis

Get kubeconfig from cinder cluster

cinder get kubeconfigs [flags]

Options

  -h, --help                help for kubeconfigs
      --kubeconfig string   sets kubeconfig path instead of $KUBECONFIG or $HOME/.kube/config
      --name string         cluster name (default "cinder")

Options inherited from parent commands

  -v, --verbose count   log output verbosity

SEE ALSO

cinder get nodes

List cinder cluster nodes

Synopsis

List cinder cluster nodes

cinder get nodes [flags]

Options

  -h, --help          help for nodes
      --name string   cluster name (default "cinder")

Options inherited from parent commands

  -v, --verbose count   log output verbosity

SEE ALSO