StatefulSets and Jobs
In part 1 we learned how volumes are used with PersistentVolumes and PersistentVolumeClaims. We used Deployment with them and everything worked well enough for our testing purposes. The problem is that Deployment creates and scales pods that are replicas - they are a new copy of the same thing. With PersistentVolumeClaims, the method through which a pod reserves persistent storage, this creates a possibly non-desired effect as the claims are not pod specific. The claim is shared by all pods in that deployment.
StatefulSets are like Deployments except it makes sure that if a pod dies the replacement is identical, with the same network identity and name. In addition if the pod is scaled the copies will have their own storage. StatefulSets are for stateful applications. You could use StatefulSets to scale video game servers that require state, such as a Minecraft server. Or run a database. For data safety when deleted StatefulSets will not delete the volumes they are associated with.
Let's run Redis and save some information there. We're going to need a PersistentVolume as well as an application that utilizes the Redis. In part 1 we jumped through a few hurdles to get ourselves storage but k3s includes a helpful storageclass that will streamline local testing.
You can apply the StatefulSet from
apiVersion: v1 # Includes the Service for lazyness kind: Service metadata: name: redis-svc labels: app: redis spec: ports: - port: 6379 name: web clusterIP: None selector: app: redis --- apiVersion: apps/v1 kind: StatefulSet metadata: name: redis-ss spec: serviceName: redis-svc replicas: 2 selector: matchLabels: app: redisapp template: metadata: labels: app: redisapp spec: containers: - name: redisfiller image: jakousa/dwk-app5:54203329200143875187753026f4e93a1305ae26 - name: redis image: redis:5.0 ports: - name: web containerPort: 6379 volumeMounts: - name: data mountPath: /data volumeClaimTemplates: - metadata: name: data spec: accessModes: ["ReadWriteOnce"] storageClassName: local-path resources: requests: storage: 100Mi
Looks a lot like Deployment but uses volumeClaimTemplate to claim a volume for each pod. StatefulSets require a "Headless Service" to be responsible for the network identity. We define a "Headless Service" with
clusterIP: None - this will instruct Kubernetes to not do proxying or load balancing and instead to allow access straight to the Pods.
You can now open two terminals and run
$ kubectl logs -f redis-ss-X redisfiller where X is 0 or 1. To confirm it's working we can delete a pod and it will restart and continue right where you left off. In addition we can delete the StatefulSet and the volume will stay and bind back when you apply the StatefulSet back.
Job resource is used to run a container that has an end state once. The status of a job is saved so that they can be monitored after the execution has ended. Jobs can be configured so that it runs multiple instances of the same task in concurrently, sequentially and until a set number of successful completions have been achieved.
An example use case for jobs would be creating backups from a database. Our Job will use environment value URL as the url from which the dump is created and pass it along to a storage server. Our database will be postgres and the tool for creating a backup is pg_dump. Now we just need to do the coding. A simple bash script should be enough.
#!/bin/bash if [ $URL ] then pg_dump -v $URL > /usr/src/app/backup.sql echo "Not sending the dump actually anywhere" # curl -F ‘data=@/usr/src/app/backup.sql’ https://somewhere fi
I have the above image ready in
jakousa/simple-backup-example. Since we don't have any postgres available to us yet let's deploy one first:
apiVersion: v1 kind: Service metadata: name: postgres-svc labels: app: postgres spec: ports: - port: 5432 name: web clusterIP: None selector: app: postgres --- apiVersion: apps/v1 kind: StatefulSet metadata: name: postgres-ss spec: serviceName: postgres replicas: 1 selector: matchLabels: app: postgres template: metadata: labels: app: postgres spec: containers: - name: postgres image: postgres:13.0 ports: - name: postgres containerPort: 5432 env: - name: POSTGRES_PASSWORD value: "example" volumeMounts: - name: data mountPath: /var/lib/postgresql/data volumeClaimTemplates: - metadata: name: data spec: accessModes: ["ReadWriteOnce"] storageClassName: local-path resources: requests: storage: 100Mi
Apply the above and check it's running:
$ kubectl get po NAME READY STATUS RESTARTS AGE postgres-ss-0 1/1 Running 0 65s
Now if we apply the following job that uses the image
apiVersion: batch/v1 kind: Job metadata: name: backup spec: template: spec: containers: - name: backup image: jakousa/simple-backup-example env: - name: URL value: "postgres://postgres:email@example.com:5432/postgres" restartPolicy: Never # This time we'll run it only once
Pods have a few available configurations. For example we can force it to retry for a number of times by defining
$ kubectl get jobs NAME COMPLETIONS DURATION AGE backup 1/1 7s 35s $ kubectl logs backup-wj9r5 ... pg_dump: saving encoding = UTF8 pg_dump: saving standard_conforming_strings = on pg_dump: saving search_path = pg_dump: implied data-only restore Not sending the dump actually anywhere
CronJobs run a Job on schedule. You may have used cron before, these are essentially the same.