HERE BE DRAGONS
Kubernetes is defined as a “container-orchestration system” and “portable, extensible platform”. In this part we’ll focus on how and why its built and how to leverage the extensibility of Kubernetes. After this part you will be able to
Create your own Custom Resource Definitions (CRDs)
List and compare platform options.
Setup a serverless platform (Knative) and deploy a simple workload
Instead of thinking about Kubernetes as something completely new I’ve found that comparing it to an operating system helps. I’m not an expert in operating systems but we’ve all used them.
Kubernetes is a layer on top of which we run our applications. It takes the resources that are accessible from the layers below and manages our applications and resources. And it provides services, such as the DNS, for the applications. With this OS mindset we can also try to go the other way: You may have used a cron (or windows’ task scheduler) for saving long term backups of some applications. Here’s the same thing in Kubernetes with CronJobs.
Now that we’ll start talking about the internals we’ll learn new insight on Kubernetes and will be able to prevent and solve problems that may result from its nature.
Due to this section being mostly a reiteration of Kubernetes documentation I will include various links the official version of the documentation - we will not setup our own Kubernetes cluster manually. If you want to go hands-on and learn to setup your own cluster with you should read and complete Kubernetes the Hard Way by Kelsey Hightower. If you have any leftover credits from part 3 this is a great way to spend some of them.
Controllers and Eventual Consistency
Controllers watch the state of your cluster and then tries to move the current state of the cluster closer to the desired state. When you declare X replicas of a Pod in your deployment.yaml, a controller called Replication Controller makes sure that that will be true. There are a number of controllers for different responsibilities.
Kubernetes Control Plane
Kubernetes Control Plane consists of
- A key-value storage that Kubernetes uses to save all cluster data.
- Decides on which node a Pod should be run on.
- Is responsible for and runs all of the controllers.
- This exposes the Kubernetes Control Plane through an API
There’s also cloud-controller-manager that lets you link your cluster into a cloud provider’s API. If you wanted to build your own cluster on Hetzner, for example, you could use hcloud-cloud-controller-manager in your own cluster installed on their VMs.
Every node has a number components that maintain the running pods.
- Makes sure containers are running in a Pod
- network proxy and maintains the network rules. Enables connections outside and inside of the cluster as well as Services to work as we’ve been using them.
And also the Container Runtime. We’ve been using Docker for this course.
In addition to all of the previously mentioned, Kubernetes has Addons which use the same Kubernetes resources we’ve been using and extend Kubernetes. You can view which resources the addons have created in the
$ kubectl -n kube-system get all NAME READY STATUS RESTARTS AGE pod/event-exporter-v0.2.5-599d65f456-vh4st 2/2 Running 0 5h42m pod/fluentd-gcp-scaler-bfd6cf8dd-kmk2x 1/1 Running 0 5h42m pod/fluentd-gcp-v3.1.1-9sl8g 2/2 Running 0 5h41m pod/fluentd-gcp-v3.1.1-9wpqh 2/2 Running 0 5h41m pod/fluentd-gcp-v3.1.1-fr48m 2/2 Running 0 5h41m pod/heapster-gke-9588c9855-pc4wr 3/3 Running 0 5h41m pod/kube-dns-5995c95f64-m7k4j 4/4 Running 0 5h41m pod/kube-dns-5995c95f64-rrjpx 4/4 Running 0 5h42m pod/kube-dns-autoscaler-8687c64fc-xv6p6 1/1 Running 0 5h41m pod/kube-proxy-gke-dwk-cluster-default-pool-700eba89-j735 1/1 Running 0 5h41m pod/kube-proxy-gke-dwk-cluster-default-pool-700eba89-mlht 1/1 Running 0 5h41m pod/kube-proxy-gke-dwk-cluster-default-pool-700eba89-xss7 1/1 Running 0 5h41m pod/l7-default-backend-8f479dd9-jbv9l 1/1 Running 0 5h42m pod/metrics-server-v0.3.1-5c6fbf777-lz2zh 2/2 Running 0 5h41m pod/prometheus-to-sd-jw9rs 2/2 Running 0 5h41m pod/prometheus-to-sd-qkxvd 2/2 Running 0 5h41m pod/prometheus-to-sd-z4ssv 2/2 Running 0 5h41m pod/stackdriver-metadata-agent-cluster-level-5d8cd7b6bf-rfd8d 2/2 Running 0 5h41m NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/default-http-backend NodePort 10.31.251.116 <none> 80:31581/TCP 5h42m service/heapster ClusterIP 10.31.247.145 <none> 80/TCP 5h42m service/kube-dns ClusterIP 10.31.240.10 <none> 53/UDP,53/TCP 5h42m service/metrics-server ClusterIP 10.31.249.74 <none> 443/TCP 5h42m NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE daemonset.apps/fluentd-gcp-v3.1.1 3 3 3 3 3 beta.kubernetes.io/fluentd-ds-ready=true,beta.kubernetes.io/os=linux 5h42m daemonset.apps/metadata-proxy-v0.1 0 0 0 0 0 beta.kubernetes.io/metadata-proxy-ready=true,beta.kubernetes.io/os=linux 5h42m daemonset.apps/nvidia-gpu-device-plugin 0 0 0 0 0 <none> 5h42m daemonset.apps/prometheus-to-sd 3 3 3 3 3 beta.kubernetes.io/os=linux 5h42m NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/event-exporter-v0.2.5 1/1 1 1 5h42m deployment.apps/fluentd-gcp-scaler 1/1 1 1 5h42m deployment.apps/heapster-gke 1/1 1 1 5h42m deployment.apps/kube-dns 2/2 2 2 5h42m deployment.apps/kube-dns-autoscaler 1/1 1 1 5h42m deployment.apps/l7-default-backend 1/1 1 1 5h42m deployment.apps/metrics-server-v0.3.1 1/1 1 1 5h42m deployment.apps/stackdriver-metadata-agent-cluster-level 1/1 1 1 5h42m NAME DESIRED CURRENT READY AGE replicaset.apps/event-exporter-v0.2.5-599d65f456 1 1 1 5h42m replicaset.apps/fluentd-gcp-scaler-bfd6cf8dd 1 1 1 5h42m replicaset.apps/heapster-gke-58bf4cb5f5 0 0 0 5h42m replicaset.apps/heapster-gke-9588c9855 1 1 1 5h41m replicaset.apps/kube-dns-5995c95f64 2 2 2 5h42m replicaset.apps/kube-dns-autoscaler-8687c64fc 1 1 1 5h42m replicaset.apps/l7-default-backend-8f479dd9 1 1 1 5h42m replicaset.apps/metrics-server-v0.3.1-5c6fbf777 1 1 1 5h41m replicaset.apps/metrics-server-v0.3.1-8559697b9c 0 0 0 5h42m replicaset.apps/stackdriver-metadata-agent-cluster-level-5d8cd7b6bf 1 1 1 5h41m replicaset.apps/stackdriver-metadata-agent-cluster-level-7bd5ddd849 0 0 0 5h42m
To get a complete picture of how each part communicates with each other “what happens when k8s” explores what happens when you do
kubectl run nginx --image=nginx --replicas=3 shedding some more light on the magic that happens behind the scenes.
Custom Resource Definitions
We’ve used a number of CRDs, Custom Resource Definitions, previously. They are a way to extend Kubernetes with our own Resources. So let us do just that and extend Kubernetes! There’s another option for API Aggregation but that’s left outside of the course.
We’ll want a resource that counts down to 0. So let’s start by defining a resource called “Countdown” - as a template I’ll use one provided by the docs.
apiVersion: apiextensions.k8s.io/v1 kind: CustomResourceDefinition metadata: # name must match the spec fields below, and be in the form: <plural>.<group> name: countdowns.stable.dwk spec: # group name to use for REST API: /apis/<group>/<version> group: stable.dwk # either Namespaced or Cluster scope: Namespaced names: # kind is normally the CamelCased singular type. Your resource manifests use this. kind: Countdown # plural name to be used in the URL: /apis/<group>/<version>/<plural> plural: countdowns # singular name to be used as an alias on the CLI and for display singular: countdown # shortNames allow shorter string to match your resource on the CLI shortNames: - cd # list of versions supported by this CustomResourceDefinition versions: - name: v1 # Each version can be enabled/disabled by Served flag. served: true # One and only one version must be marked as the storage version. storage: true schema: openAPIV3Schema: type: object properties: spec: type: object properties: length: type: integer delay: type: integer image: type: string additionalPrinterColumns: - name: Length type: integer description: The length of the countdown jsonPath: .spec.length - name: Delay type: integer description: The length of time (ms) between executions jsonPath: .spec.delay
Now we can create our own Countdown:
apiVersion: stable.dwk/v1 kind: Countdown metadata: name: doomsday spec: length: 20 delay: 1200 image: jakousa/dwk-app10:sha-84d581d
$ kubectl apply -f countdown.yaml countdown.stable.dwk/doomsday created $ kubectl get cd NAME LENGTH DELAY doomsday 20 1200
Now we have a new resource. Next let’s create a new custom controller that’ll start a pod that runs a container from the image and makes sure countdowns are destroyed. This will require some coding.
For the implementation I decided on a Kubernetes resource called Jobs. Jobs are a resource that creates a pod just like the Deployments we’re now familiar with. Pods created by Jobs are intended to run once until completion, however they are not removed automatically and neither are the Pods created from a Job removed with the Job. The Pods are preserved so that the execution logs can be reviewed after job execution. Excellent use cases for Jobs are, for example, backup operations.
So our controller has to do 3 things:
- Create Job from a Countdown
- Reschedule Jobs until the number of executions defined in Countdown have been completed.
- Clean all Jobs and Pods after execution
By listening to the Kubernetes API at
/apis/stable.dwk/v1/countdowns?watch=true we will receive an ADDED for every Countdown object in the cluster. Then creating a job is just parsing the data from the message and POSTing a valid payload to
For jobs we’ll listen to
/apis/batch/v1/jobs?watch=true and wait for MODIFIED event where the success state is set to true and update the labels for the jobs to store the status. To delete a job and its pod we can send delete request to
And finally to delete the countdown a request to
A version of this controller has been implemented here:
jakousa/dwk-app10-controller:sha-4256579. But we cannot simply deploy it as it won’t have access to the APIs. For this we will need to define suitable access.
RBAC (Role-based access control) is an authorization method that allows us to define access for individual users, service accounts or groups by giving them roles. For our use case we will define a ServiceAccount resource.
apiVersion: v1 kind: ServiceAccount metadata: name: countdown-controller-account
and then specify the serviceAccountName for the deployment
apiVersion: apps/v1 kind: Deployment metadata: name: countdown-controller-dep spec: replicas: 1 selector: matchLabels: app: countdown-controller template: metadata: labels: app: countdown-controller spec: serviceAccountName: countdown-controller-account containers: - name: countdown-controller image: jakousa/dwk-app10-controller:sha-4256579
Next is defining the role and its rules. There are two types of roles: ClusterRole and Role. Roles are namespace specific whereas ClusterRoles can access all of the namespaces - in our case the controller will access all countdowns in all namespaces so a ClusterRole will be required.
The rules are defined with the apiGroup, resource and verbs. For example the jobs was
/apis/batch/v1/jobs?watch=true so it’s in the apiGroup “batch” and resource “jobs” and the verbs see documentation. Core api group is an empty string “” like in the case of pods.
kind: ClusterRole apiVersion: rbac.authorization.k8s.io/v1 metadata: name: countdown-controller-role rules: - apiGroups: [""] # at the HTTP level, the name of the resource for accessing Pod # objects is "pods" resources: ["pods"] verbs: ["get", "list", "delete"] - apiGroups: ["batch"] # at the HTTP level, the name of the resource for accessing Job # objects is "jobs" resources: ["jobs"] verbs: ["get", "list", "watch", "create", "delete"] - apiGroups: ["stable.dwk"] resources: ["countdowns"] verbs: ["get", "list", "watch", "create", "delete"]
And finally bind the ServiceAccount and the role. There are two types of bindings as well. ClusterRoleBinding and RoleBinding. If we used a RoleBinding with a ClusterRole we would be able to restrict access to a single namespace. For example, if permission to access secrets is defined to a ClusterRole and we gave it via RoleBinding to a namespace called “test” they would only be able to access secrets in the namespace “test” - even though the role is a “ClusterRole”.
In our case ClusterRoleBinding is required since we want the controller to access all of the namespaces from the namespace it’s deployed in, in this case namespace “default”.
apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: countdown-rolebinding roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: countdown-controller-role subjects: - kind: ServiceAccount name: countdown-controller-account namespace: default
After deploying all of that we can check logs after applying a countdown. (You may have to delete the pod to have it restart in case it didn’t have access and it got stuck)
$ kubectl logs countdown-controller-dep-7ff598ffbf-q2rp5 > email@example.com start /usr/src/app > node index.js Scheduling new job number 20 for countdown doomsday to namespace default Scheduling new job number 19 for countdown doomsday to namespace default ... Countdown ended. Removing countdown. Doing cleanup
This exercise doesn’t rely on previous exercises. You may again choose which ever technologies you want for the implementation.
We need a DummySite resource that can be used to create a html page from any url.
Create a “DummySite” resource that takes has a string property called “website_url”.
Create a controller that creates all of the required resources that are required for the functionality.
Refer to https://kubernetes.io/docs/reference/using-api/client-libraries/ for information about client libraries.
The API docs are here for the apiGroups and example requests: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18
Test that creating a DummySite resource with website_url “https://example.com/” should create a copy of the website.
The controller doesn't have to work perfectly in all circumstances. The following workflow should succeed: 1. apply role, account and binding. 2. apply deployment. 3. apply DummySite
Finally as Kubernetes is a platform we’ll go over a few popular building blocks that use Kubernetes.
OpenShift is an “enterprise” Kubernetes (Red Hat OpenShift Overview). Claiming that you don’t have Kubernetes because you have OpenShift would be equal to claiming “I don’t have an engine. I have a car!”. For other options for production-ready Kubernetes see Rancher, which you might have seen before in this url https://github.com/rancher/k3d, and Anthos GKE, which might also sound familiar. They are all options when you’re making the crucial decision between which Kubernetes distribution you want or would you like to use a managed service.
TODO: Exercise for comparing platforms
Serverless has gained a lot of popularity and it’s easy to see why. Be it Google Cloud Run, Knative, OpenFaaS, OpenWhisk, Fission or Kubeless they’re running on top of Kubernetes, or atleast capable of doing so. The older the serverless platform the more likely it won’t be running on Kubernetes. With this in light a discussions if Kubernetes is competing with serverless doesn’t make much sense.
As this isn’t a serverless course we won’t go into depth about it but serverless sounds pretty dope. So next let’s setup a serverless platform on our k3d because that’s something we can do. For this let’s choose Knative for no particular reason other than that it sounds great.
We will follow this guide to install “Serving” component of Knative. There’s probably a Helm Chart for this but we’ll meet another tool called Istio along the way. For Istio to work locally in k3d we’ll need to create our cluster without the traefik ingress.
$ k3d cluster create --port '8082:30080@agent' -p 8081:80@loadbalancer --agents 2 --k3s-server-arg '--no-deploy=traefik'
Now Knative crds and core:
$ kubectl apply -f https://github.com/knative/serving/releases/download/v0.16.0/serving-crds.yaml ... $ kubectl apply -f https://github.com/knative/serving/releases/download/v0.16.0/serving-core.yaml ...
Next we’ll install Istio.
Surprisingly we haven’t met Istio before now. Istio is a service mesh. Service mesh works as a layer facilitating communications between services. This means load-balancing, monitoring, encryption and traffic control. Full set of features of it can be found on their website “What is Istio”.
As Istio handles traffic control it could’ve handled the canary rollouts introduced in part 4
Istio will require its own command-line tools. Let’s install istioctl and install the minimal operator with instructions from Knative guide with the following yaml:
apiVersion: install.istio.io/v1alpha1 kind: IstioOperator spec: values: global: proxy: autoInject: disabled useMCP: false # The third-party-jwt is not enabled on all k8s. # See: https://istio.io/docs/ops/best-practices/security/#configure-third-party-service-account-tokens jwtPolicy: first-party-jwt addonComponents: pilot: enabled: true prometheus: enabled: false components: ingressGateways: - name: istio-ingressgateway enabled: true - name: cluster-local-gateway enabled: true label: istio: cluster-local-gateway app: cluster-local-gateway k8s: service: type: ClusterIP ports: - port: 15020 name: status-port - port: 80 name: http2 - port: 443 name: https
And install Istio with
$ istioctl install -f istio-minimal-operator.yaml ✔ Istio core installed ✔ Istiod installed ✔ Ingress gateways installed ✔ Addons installed ✔ Installation complete
Next we can install Knative Istio controller
$ kubectl apply -f https://github.com/knative/net-istio/releases/download/v0.16.0/release.yaml ...
And that’s it. We’ll leave the step 4 for configuring DNS out.
Hello Serverless World
For testing purposes let’s do a hello world from the Knative samples. In Knative there’s another new resource called Service, not to be mixed up with the Kubernetes resource Service. These Services are used to manage the core Kubernetes resources
apiVersion: serving.knative.dev/v1 kind: Service metadata: name: helloworld-go spec: template: metadata: name: helloworld-go-dwk-message-v1 spec: containers: - image: gcr.io/knative-samples/helloworld-go env: - name: TARGET value: "DwK"
$ kubectl apply -f knative-service.yaml service.serving.knative.dev/helloworld-go created
As previously mentioned we don’t have DNS so accessing the application isn’t as easy. We’ll have to set the Host parameter for our requests. Find out the host from:
$ kubectl get ksvc NAME URL LATESTCREATED LATESTREADY READY REASON helloworld-go http://helloworld-go.default.example.com helloworld-go-dwk-message-v1 helloworld-go-dwk-message-v1 True
We’ll need the URL field. Note also LATESTCREATED and LATESTREADY, they’re revisions of the application. If we alter the knative-service.yaml it’ll create new revisions where we could change between revisions.
Now we can see that there are no pods running. There may be one as Knative spins one pod during the creation of the service, wait until no helloworld-go resources are found.
$ kubectl get po No resources found in default namespace.
and when we send a request to the application
$ curl -H "Host: helloworld-go.default.example.com" http://localhost:8081 Hello DwK! $ kubectl get po NAME READY STATUS RESTARTS AGE helloworld-go-dwk-message-v1-deployment-6664bc858f-jqlv6 1/2 Running 0 6s
it works and there are almost instantly pods ready.
Let’s test the revisions by changing the contents of the yaml and applying it.
apiVersion: serving.knative.dev/v1 kind: Service metadata: name: helloworld-go spec: template: metadata: name: helloworld-go-dwk-message-v2 # v2 spec: containers: - image: gcr.io/knative-samples/helloworld-go env: - name: TARGET value: "DwK-but-better" # Changed content traffic: # traffic enables us to split traffic between multiple revisions! - revisionName: helloworld-go-dwk-message-v1 percent: 100 - revisionName: helloworld-go-dwk-message-v2 percent: 0
This created a new revision and edited the route. We can view the CRDs Revision and Route.
$ kubectl get revisions,routes NAME CONFIG NAME K8S SERVICE NAME GENERATION READY REASON revision.serving.knative.dev/helloworld-go-dwk-message-v1 helloworld-go helloworld-go-dwk-message-v1 1 True revision.serving.knative.dev/helloworld-go-dwk-message-v2 helloworld-go helloworld-go-dwk-message-v2 2 True NAME URL READY REASON route.serving.knative.dev/helloworld-go http://helloworld-go.default.example.com True
So now when we send a request it’s still the old message!
$ curl -H "Host: helloworld-go.default.example.com" http://localhost:8081 Hello DwK!
Let’s set the messages between v1 and v2 at 50% each and create a new revision with the best that’ll be open at another host!
apiVersion: serving.knative.dev/v1 kind: Service metadata: name: helloworld-go spec: template: metadata: name: helloworld-go-dwk-message-v3 # v3 spec: containers: - image: gcr.io/knative-samples/helloworld-go env: - name: TARGET value: "DwK-but-extreme" # Changed content traffic: # traffic enables us to split traffic between multiple revisions! - revisionName: helloworld-go-dwk-message-v1 percent: 50 - revisionName: helloworld-go-dwk-message-v2 percent: 50
Now curling will result in 50% - 50% difference between the v1 and v2 messages. But accessing v3 is currently disabled. Let’s add routing to v3 by defining a Route ourselves.
apiVersion: serving.knative.dev/v1 kind: Route metadata: name: tester-route spec: traffic: - revisionName: helloworld-go-dwk-message-v3 percent: 100
$ kubectl apply -f route.yaml route.serving.knative.dev/tester-route created $ kubectl get routes NAME URL READY REASON helloworld-go http://helloworld-go.default.example.com True tester-route http://tester-route.default.example.com True $ curl -H "Host: tester-route.default.example.com" http://localhost:8081 Hello DwK-but-extreme!
Let’s test serverless by making a part of the pingpong application Serverless.