Build a Kubernetes Cluster You Own

FOLIO CVIII 2026-06-21 · 23 MIN · LONG-FORM

Build a Kubernetes Cluster You Own

The vendor-neutral substrate for a self-hosted platform: kubeadm, containerd, and Cilium on generic VMs

Diagram · folio cviii

flowchart TB
  subgraph VMS[Generic VMs, any provider or bare metal]
    CP["control-plane<br>kubeadm init"]
    W1["worker 1"]
    W2["worker 2"]
  end
  CP <-->|private network| W1
  CP <-->|private network| W2
  CIL["Cilium CNI<br>eBPF networking<br>WireGuard encryption"]
  CIL -.-> CP
  CIL -.-> W1
  CIL -.-> W2

Every platform in this series sits on one foundation: a Kubernetes cluster you own outright. Not a managed control plane you rent from a cloud and cannot pick up and move, but a cluster you stood up yourself on plain virtual machines, one that runs the same on any provider or on bare metal in a rack. That portability is the entire point. It is what keeps you out of lock-in, and it is the layer everything else bolts onto. This post builds it, current for 2026.

This series rebuilds my 2020 Apress book, Advanced Platform Development with Kubernetes, for 2026. The approach behind it comes from building and running data platforms in production for more than twenty years.

§Custom, Not Managed, on Purpose

GKE, EKS, and AKS are real and they are fine. If you want a managed control plane and you are comfortable being a tenant of one cloud, use one. But the whole argument of this series is that you can build more than a managed platform offers, at a fraction of the cost, with nobody able to change the terms on you. That starts with refusing to rent the bottom layer.

Think about what a managed control plane actually is. You hand the provider the one component that defines the cluster, the API server and etcd, and in return you get a console toggle and a bill. The control plane is the cheap part to run and the expensive part to be locked into. Your nodes still cost the same per core whether the provider babysits the API server or you do. What you are really paying for is the convenience, and the price of that convenience is that the cluster is no longer portable. The IAM integration, the load balancer annotations, the storage classes, the logging sink, all of it is wired to one vendor. Moving becomes a migration project instead of a copy.

This is the same story the rest of the series tells about software. The years from 2020 to 2026 were a steady run of companies relicensing the open tools everyone had built on, Elastic, HashiCorp, Redis, MinIO. The lesson was not “open source failed,” it was “the terms can change on anything you do not control.” A managed control plane is the infrastructure version of the same exposure. The terms, the pricing, the deprecation schedule, the regions, all sit on the other side of a contract you do not write.

A self-managed cluster on generic virtual machines is portable by construction. The same handful of commands brings it up on DigitalOcean, Vultr, Hetzner, Linode, Scaleway, or a stack of bare-metal boxes. Nothing in it is specific to a provider, so nothing in it locks you to one. When a provider raises prices or you outgrow them, you stand the same cluster up somewhere else and move the workloads. The substrate is a commodity, which is exactly what you want underneath everything else.

It also leaves the whole machine in reach, and that matters more now than it used to. An agent operating a cluster you own can touch the kubelet, the container runtime, the network, the host. It can read the containerd config, adjust a sysctl, pull a node out and rejoin it, inspect why a pod will not schedule at the level where the answer actually lives. A managed control plane fences off exactly the parts you would most want an agent working on, and hands you a support ticket instead. The economic case for self-hosting in 2026 is that the labor which used to require a team of experts is now an agent with the right context. That case gets stronger the more of the machine the agent is allowed to see.

None of this means managed is wrong for everyone. It means that if your goal is the most capability for the least cost with no one able to move the floor under you, you own the floor. The rest of this post is how.

§The Shape of What We Are Building

Three virtual machines. One becomes the control plane, running the API server, scheduler, controller manager, and etcd. The other two are workers that run your actual workloads. Pod traffic and the API server stay on a private network between the nodes. There is no cloud load balancer in front of anything, by design, because a cloud load balancer is a provider-specific dependency and we are avoiding those at this layer.

The container runtime is containerd, directly under the kubelet, with Docker nowhere in the picture. The network is Cilium, an eBPF data plane that also replaces kube-proxy and encrypts node-to-node traffic with WireGuard. That single choice folds in three separate pieces of manual work the 2020 edition did by hand: the CNI, the service proxy, and the inter-node encryption overlay.

This is a development-grade topology: a single control plane is a single point of failure for the API, though your running workloads keep running even if the control plane blinks. That tradeoff is right for a platform you are building and iterating on, and the last section covers exactly what to add when you want the control plane itself to be highly available. Everything else here is the same procedure you would use in production.

§What You Need

Three virtual machines, a current Ubuntu LTS (24.04, or 26.04 if you want the newest), 2 vCPUs and 4 GB of RAM each is comfortable for a development platform. Enable private networking between them so cluster traffic stays off the public interface. A cluster like this runs well under fifty dollars a month on most providers, and scales by adding nodes when you need them.

The provider does not matter, which is the entire point, but it helps to see it concretely once. On Hetzner Cloud, for example, you create a Network with a private range like 10.0.0.0/16, add a subnet, and attach three servers (a CX22 or larger) to it so they share that private space. On DigitalOcean you enable VPC networking in one region and create three Droplets on it. On bare metal it is whatever switch the boxes already share. In every case the result is identical: three machines that reach each other on a private interface the public internet cannot. The cluster does not know or care which of these you picked, and neither will anything you build on it.

Give the nodes stable names and note both their public and private addresses. The examples use:

platform-cp   control plane   10.0.0.10 (private)
platform-w1   worker          10.0.0.11 (private)
platform-w2   worker          10.0.0.12 (private)

Set the hostnames so the cluster reports something meaningful, and so an agent reading kubectl get nodes later sees names with intent in them rather than a provider’s random string.

# on each node, with its own name
hostnamectl set-hostname platform-cp

Everything below runs as root over SSH. Run the preparation, container runtime, and Kubernetes package steps on all three nodes. The init step runs only on the control plane, and the join step only on the workers.

§Open the Right Ports

Before any of this works, the nodes have to be able to talk to each other on the ports Kubernetes and Cilium use. On most cloud providers you do this with a security group or firewall rule scoped to the private network; on bare metal it is your host firewall. Open these between the nodes, not to the public internet.

Control plane, inbound:

Port	Protocol	Purpose
6443	TCP	Kubernetes API server
2379-2380	TCP	etcd client and peer
10250	TCP	kubelet API
10257	TCP	kube-controller-manager
10259	TCP	kube-scheduler

Workers, inbound:

Port	Protocol	Purpose
10250	TCP	kubelet API
30000-32767	TCP	NodePort service range

Cilium, on all nodes:

Port	Protocol	Purpose
4240	TCP	health checks
4244-4245	TCP	Hubble server and relay
8472	UDP	VXLAN overlay
51871	UDP	WireGuard encryption

If the cluster comes up but pods on different nodes cannot reach each other, or nodes flap between Ready and NotReady, the firewall is the first suspect. The VXLAN and WireGuard UDP ports are the ones people forget, because nothing complains loudly when they are closed; traffic just silently does not arrive.

§Prepare Each Node

Kubernetes wants swap off and a couple of kernel modules and sysctls in place so pod networking works.

# swap off, now and on reboot
swapoff -a
sed -i '/ swap / s/^/#/' /etc/fstab

# kernel modules
cat <<EOF >/etc/modules-load.d/k8s.conf
overlay
br_netfilter
EOF
modprobe overlay
modprobe br_netfilter

# let bridged traffic reach iptables, enable forwarding
cat <<EOF >/etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-iptables  = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward                 = 1
EOF
sysctl --system

The overlay module is what containerd’s storage driver needs. br_netfilter plus the two bridge sysctls are what let the kernel apply iptables rules to bridged packets, which is how pod traffic gets filtered and routed. ip_forward lets the node route between interfaces at all. The kubelet runs a preflight check that fails if swap is on, because the scheduler’s memory accounting assumes pages do not silently move to disk.

One more thing while you are on every node: make sure the clock is synced. etcd, the key-value store under the control plane, is unforgiving about clock skew between members, and certificate validation across the cluster assumes the nodes agree on the time. Ubuntu keeps systemd-timesyncd running by default, so this is usually just a confirmation.

timedatectl
# System clock synchronized: yes
#               NTP service: active

If it is not active, timedatectl set-ntp true turns it on, or install chrony for a more capable NTP client.

§Containerd, Not Docker

The 2020 edition installed Docker. That is no longer how this works. Kubernetes removed the Docker shim in 1.24, and the runtime under the kubelet is containerd directly. You do not need Docker on a cluster node at all. The kubelet speaks the Container Runtime Interface, containerd implements it, and Docker was only ever a layer on top of containerd to begin with.

apt-get update
apt-get install -y containerd

mkdir -p /etc/containerd
containerd config default >/etc/containerd/config.toml

Two edits to the generated config matter. The first is the cgroup driver. The kubelet and containerd both have to agree on how they talk to the kernel’s cgroup hierarchy, and on a modern systemd host that means the systemd driver on both sides.

sed -i 's/SystemdCgroup = false/SystemdCgroup = true/' /etc/containerd/config.toml

That SystemdCgroup = true line is the one that quietly ruins your day if you skip it. The kubelet and containerd end up on different cgroup drivers and kubeadm init fails, or worse, the node comes up and then pods restart in ways that do not obviously point at the cause.

The second edit is the sandbox image, and this one bites people who never think to look. Every pod has a tiny “pause” container that holds its network namespace open. containerd ships with a default pause image pin, and kubeadm has its own idea of which pause image the cluster should use. When those disagree, you get pods stuck in a pull loop or odd permission errors on the pause container. Pin containerd’s sandbox image to whatever the kubeadm version expects.

# ask kubeadm which pause image it wants
kubeadm config images list | grep pause
# registry.k8s.io/pause:3.10

# set that exact value in containerd
sed -i 's#sandbox_image = .*#sandbox_image = "registry.k8s.io/pause:3.10"#' \
  /etc/containerd/config.toml

(You install kubeadm in the next step; run the pause edit after that, or come back to it. The version string moves with the Kubernetes minor version, so read it rather than copying mine.)

Then restart and enable containerd so it survives reboots.

systemctl restart containerd
systemctl enable containerd

§Install kubeadm, kubelet, and kubectl

The old packages.cloud.google.com apt repository is gone. Packages live at pkgs.k8s.io now, in a per-minor-version path. This uses the 1.33 line; bump the version in both URLs to move the whole cluster to a newer minor later.

apt-get install -y apt-transport-https ca-certificates curl gpg
mkdir -p /etc/apt/keyrings

curl -fsSL https://pkgs.k8s.io/core:/stable:/v1.33/deb/Release.key \
  | gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg

echo 'deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v1.33/deb/ /' \
  >/etc/apt/sources.list.d/kubernetes.list

apt-get update
apt-get install -y kubelet kubeadm kubectl
apt-mark hold kubelet kubeadm kubectl

The apt-mark hold keeps an apt upgrade from moving the cluster out from under you. You upgrade Kubernetes deliberately, a minor version at a time, following the documented kubeadm upgrade path, never by accident in the middle of an unrelated package update. This is one of those small disciplines that separates a cluster you can reason about from one that surprises you.

§A Declarative Control Plane

You can bring up a control plane with kubeadm init and a fistful of flags, and most tutorials do. We are not going to, because the rest of this series is manifest-first for a reason: a declarative file is something you can read, diff, commit, and hand to an agent as the source of truth. A line of flags in someone’s shell history is none of those things.

Write a kubeadm-config.yaml on the control-plane node. This sets three things deliberately: the API server advertises on the private address, we pin the cgroup driver to systemd to match containerd, and we tell kubeadm to skip installing kube-proxy entirely, because Cilium is going to replace it.

# kubeadm-config.yaml (control plane only)
apiVersion: kubeadm.k8s.io/v1beta4
kind: InitConfiguration
localAPIEndpoint:
  advertiseAddress: "10.0.0.10"   # control-plane private IP
  bindPort: 6443
skipPhases:
  - addon/kube-proxy
---
apiVersion: kubeadm.k8s.io/v1beta4
kind: ClusterConfiguration
kubernetesVersion: "v1.33.2"
controlPlaneEndpoint: "10.0.0.10:6443"
networking:
  # Cilium manages pod IPs; this is the range it will use
  podSubnet: "10.244.0.0/16"
---
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
cgroupDriver: systemd

Skipping kube-proxy at init is the cleaner path than installing it and tearing it out. We tell Cilium to take over service routing, and there is never a kube-proxy to conflict with it. Setting controlPlaneEndpoint even for a single control plane is deliberate too: it means that if you later put a load balancer in front and add control-plane nodes, the existing nodes already point at a stable endpoint instead of one node’s IP.

Initialize the control plane from the file. Run this only on the control-plane node.

kubeadm init --config kubeadm-config.yaml

It runs preflight checks, pulls the control-plane images, writes the static pod manifests, brings up etcd and the API server, and finishes by printing a kubeadm join command with a token and a CA hash:

Your Kubernetes control-plane has initialized successfully!

...

You can now join any number of worker nodes by running the following on each as root:

kubeadm join 10.0.0.10:6443 --token abcdef.0123456789abcdef \
  --discovery-token-ca-cert-hash sha256:1a2b3c...

Copy that join command somewhere; the workers need it in a minute. Then set up your kubeconfig on the control plane so kubectl works:

mkdir -p $HOME/.kube
cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
chown $(id -u):$(id -g) $HOME/.kube/config

If you run kubectl get nodes now, the control plane shows up as NotReady. That is expected. There is no pod network yet, so the node cannot host pods, and the kubelet reports itself not ready until a CNI is in place. Cilium is next.

§Cilium: Networking, kube-proxy Replacement, and Encryption

The cluster needs a CNI to give pods addresses and route traffic between them. In the 2020 edition that was a manual choice, and a popular one back then, Weave Net, reached end of life when Weaveworks shut down in 2024. That is its own small lesson about building on a single company’s project, and it is the reason the 2026 answer is Cilium: a CNCF graduated project with an eBPF data plane that is fast, deeply observable, replaces kube-proxy, and is the foundation the Gateway API work in a later post builds on.

Install the Cilium CLI on the control plane. Cilium installs through its own CLI rather than raw Helm, which is the idiomatic path for it.

CILIUM_CLI_VERSION=$(curl -s https://raw.githubusercontent.com/cilium/cilium-cli/main/stable.txt)
curl -L --fail --remote-name-all \
  https://github.com/cilium/cilium-cli/releases/download/${CILIUM_CLI_VERSION}/cilium-linux-amd64.tar.gz
tar xzvf cilium-linux-amd64.tar.gz -C /usr/local/bin
rm cilium-linux-amd64.tar.gz

Now install Cilium into the cluster in a single command that turns on everything we want. Because we skipped kube-proxy, Cilium needs to know how to reach the API server itself, so we pass the control-plane address and port explicitly.

cilium install \
  --set kubeProxyReplacement=true \
  --set k8sServiceHost=10.0.0.10 \
  --set k8sServicePort=6443 \
  --set encryption.enabled=true \
  --set encryption.type=wireguard

cilium status --wait

Three things are happening in that one command, and each replaced a section of manual work in the old edition.

kubeProxyReplacement=true hands all of Kubernetes Service routing to Cilium’s eBPF programs. There is no kube-proxy DaemonSet writing thousands of iptables rules; service load balancing happens in the kernel’s eBPF layer, which is faster and far easier to observe.

The encryption flags fold in what used to be an entire chapter. Chapter 3 of the book walked through installing WireGuard on every node and wiring up an overlay by hand to encrypt traffic between them, because private networking on a shared cloud is not something to fully trust. Cilium does that for you. Every pod-to-pod packet between nodes now rides an automatically keyed WireGuard tunnel. The agent on each node creates its own key pair and publishes the public half through a Kubernetes resource. No keys to generate, no config to distribute, no overlay to maintain. This is the kind of thing that used to be a section of careful manual work and is now two flags.

Once cilium status reports everything green, confirm encryption is actually live:

cilium encrypt status

Encryption: Wireguard
Interface: cilium_wg0
  Keys: 3
  ...

Optionally turn on Hubble, Cilium’s observability layer, so you and any agent operating this cluster can see flows between services later.

cilium hubble enable --ui

§Join the Workers

On each worker node, run the kubeadm join command that the control-plane init printed, over the private address.

kubeadm join 10.0.0.10:6443 --token abcdef.0123456789abcdef \
  --discovery-token-ca-cert-hash sha256:1a2b3c...

The default token expires after 24 hours. If you lost the command or it aged out, regenerate it from the control plane:

kubeadm token create --print-join-command

Give it a moment for Cilium to schedule its agent onto the new nodes, then check from the control plane:

kubectl get nodes

NAME          STATUS   ROLES           AGE     VERSION
platform-cp   Ready    control-plane   6m      v1.33.2
platform-w1   Ready    <none>          2m      v1.33.2
platform-w2   Ready    <none>          2m      v1.33.2

Three Ready nodes. You own a Kubernetes cluster.

§Verify the Cluster Actually Works

Ready means the kubelet is happy. It does not prove that a pod on one node can reach a pod on another, that services resolve, and that the encryption you turned on is carrying real traffic. Cilium ships a connectivity test that checks all of it by deploying a set of pods across your nodes and exercising the paths between them.

cilium connectivity test

It spins up its test namespace, runs dozens of checks across nodes, and reports a summary. This takes a few minutes and is worth every second; it is the difference between believing the network works and knowing it does.

✅ All 42 tests (XXX actions) successful, 0 tests skipped, 0 scenarios skipped.

For a quick manual confirmation, run a throwaway pod and resolve the in-cluster API service through CoreDNS, which only works if pod networking and service routing are both functioning:

kubectl run netcheck --rm -it --image=nicolaka/netshoot --restart=Never -- \
  nslookup kubernetes.default

A clean answer means DNS, the pod network, and Cilium’s service routing are all doing their jobs.

Finally, prove it runs a real workload, not just test pods. Deploy something with a couple of replicas behind a Service and confirm the replicas land on your workers.

# nginx-smoke.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: smoke
spec:
  replicas: 2
  selector:
    matchLabels: { app: smoke }
  template:
    metadata:
      labels: { app: smoke }
    spec:
      containers:
        - name: nginx
          image: nginx:stable
          ports: [{ containerPort: 80 }]
---
apiVersion: v1
kind: Service
metadata:
  name: smoke
spec:
  selector: { app: smoke }
  ports: [{ port: 80, targetPort: 80 }]

kubectl apply -f nginx-smoke.yaml
kubectl get pods -o wide

NAME                     READY   STATUS    NODE
smoke-7d9c8b4f9-4k2lp    1/1     Running   platform-w1
smoke-7d9c8b4f9-q8xrt    1/1     Running   platform-w2

Both replicas scheduled onto workers, never onto the control plane. That is not luck. kubeadm init taints the control-plane node node-role.kubernetes.io/control-plane:NoSchedule, so ordinary workloads stay off it and it keeps its resources for running the cluster. On a small dev cluster you can remove that taint to reclaim the node as capacity, but leaving it in place is the cleaner default. Tear the smoke test down when you are satisfied with kubectl delete -f nginx-smoke.yaml.

Clean up the Cilium test namespace when you are done:

cilium connectivity test --cleanup

§Reach It From Your Workstation

You do not want to live on the control plane over SSH. Copy its admin kubeconfig to your workstation and point kubectl at it. The file is at /etc/kubernetes/admin.conf; if you are coming in from outside the private network, edit the server: field to the control plane’s reachable address.

scp root@<CONTROL_PLANE_PUBLIC_IP>:/etc/kubernetes/admin.conf ~/.kube/platform.conf

Give the context a real name rather than the default kubernetes-admin@kubernetes, so when you have several clusters you can tell them apart at a glance:

KUBECONFIG=~/.kube/platform.conf kubectl config rename-context \
  kubernetes-admin@kubernetes platform

export KUBECONFIG=~/.kube/platform.conf
kubectl config use-context platform
kubectl get nodes

For actually developing against the services you will run on this cluster, databases, search, object storage, you will want to reach them by name from your laptop as if you were inside the cluster. That is what kubefwd is for, and it shows up throughout the rest of this series. One command forwards a whole namespace of services to your workstation under their real names, so code on your laptop connects to postgres:5432 or opensearch:9200 exactly as a pod would.

§When Something Is Wrong

A few failure modes account for most of the trouble standing this up, and they all look more mysterious than they are.

The control plane will not initialize. Almost always the cgroup driver. Confirm SystemdCgroup = true is actually in the containerd config and that you restarted containerd after editing it. kubeadm init runs preflight checks that catch swap and missing modules, but a cgroup mismatch fails later and less clearly.

Pods stick in ContainerCreating or the pause container loops. The sandbox image mismatch from the containerd section. Run kubeadm config images list | grep pause, compare it to sandbox_image in /etc/containerd/config.toml, make them match, restart containerd.

Nodes flap between Ready and NotReady, or cross-node pods cannot talk. The firewall. The VXLAN port 8472/UDP and the WireGuard port 51871/UDP are the usual culprits, because a closed UDP port produces silence rather than an error. Confirm the private-network rules from the ports section are actually applied.

A worker will not join. The token expired (24 hours), or the worker cannot reach 10.0.0.10:6443. Regenerate the token with kubeadm token create --print-join-command and confirm the worker can curl -k https://10.0.0.10:6443/healthz over the private network.

kubectl from your workstation hangs or refuses the connection. The server: field in your copied kubeconfig still points at the private IP, which your laptop cannot route to. Edit it to the control plane’s public address and make sure 6443 is reachable from where you are.

§Owning the Cluster Means Operating It

A managed control plane quietly does a handful of things for you: it backs up etcd, it replaces nodes, it runs your version upgrades. Own the cluster and those become yours too. That is not a burden so much as the rest of the deal, and each one is a known procedure rather than a mystery. It is also exactly the kind of routine an agent with this post in front of it can run on a schedule. Three of them matter day to day.

§Back Up etcd

etcd is the one piece of irreplaceable state on the control plane. Every object in the cluster, every Secret, every bit of config lives there. Lose it without a backup and you are rebuilding from memory. kubeadm runs etcd as a static pod on the control-plane node, and you snapshot it with etcdctl pointed at the local endpoint, using the certificates kubeadm already generated.

apt-get install -y etcd-client

ETCDCTL_API=3 etcdctl \
  --endpoints=https://127.0.0.1:2379 \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  --cert=/etc/kubernetes/pki/etcd/server.crt \
  --key=/etc/kubernetes/pki/etcd/server.key \
  snapshot save /root/etcd-$(date +%Y%m%d-%H%M).db

Snapshot saved at /root/etcd-20260621-1430.db

Copy that file off the node, ideally to the object storage you stand up later in this series, and run the whole thing from a cron job so it happens nightly without you. Restoring is the documented reverse, etcdctl snapshot restore into a fresh data directory followed by pointing etcd at it, and it is a deliberate, eyes-open operation you do when you need it, not something to automate blindly. The discipline that matters day to day is simpler: take the snapshots, and make sure they leave the node.

§Add and Remove Nodes

Scaling this cluster is the worker procedure you already ran, repeated. Prepare a new machine with the same node-prep, containerd, and Kubernetes-package steps, then pull a fresh join command from the control plane and run it on the new node.

# on the control plane
kubeadm token create --print-join-command

Removing a node is the reverse, done gracefully so nothing is lost. Drain it to evict its pods onto the remaining nodes, delete it from the API, then reset the machine itself.

# from your workstation
kubectl drain platform-w2 --ignore-daemonsets --delete-emptydir-data
kubectl delete node platform-w2

# on the node being removed
kubeadm reset

The same drain is what you run before any disruptive maintenance on a node, a kernel update or a resize. It cordons the node so nothing new schedules there and moves the running pods elsewhere first. kubectl uncordon platform-w2 puts it back into rotation when you are done.

§Upgrade on Your Schedule

This is the one the apt-mark hold earlier was protecting. You upgrade Kubernetes deliberately, one minor version at a time, never skipping a minor, and never as a side effect of an unrelated apt upgrade. Because the package repository is per-minor-version, the first step is pointing apt at the new minor.

# move the repo from v1.33 to v1.34 on the node first
sed -i 's#/v1.33/#/v1.34/#' /etc/apt/sources.list.d/kubernetes.list
apt-get update

On the control plane, upgrade kubeadm, let it plan and apply, then bring the node’s own components up to match.

apt-mark unhold kubeadm
apt-get install -y kubeadm=1.34.0-1.1
apt-mark hold kubeadm

kubeadm upgrade plan
kubeadm upgrade apply v1.34.0

kubectl drain platform-cp --ignore-daemonsets
apt-mark unhold kubelet kubectl
apt-get install -y kubelet=1.34.0-1.1 kubectl=1.34.0-1.1
apt-mark hold kubelet kubectl
systemctl daemon-reload && systemctl restart kubelet
kubectl uncordon platform-cp

On each worker the dance is shorter: kubeadm upgrade node, then drain, upgrade the kubelet the same way, and uncordon. One node at a time, so the cluster stays up throughout. The whole thing is a procedure you schedule, which is the difference between a cluster you control and one that controls you.

§One Control Plane, and When to Add More

This cluster has a single control-plane node, which is the right call for a development platform: simple, cheap, and your workloads keep running even if the control plane restarts. What you lose is API availability while that one node is down, and the etcd data lives on one disk.

When you want the control plane itself to be highly available, the shape changes in three ways, and kubeadm supports all of it natively. You run three control-plane nodes instead of one, so etcd has the quorum it needs to tolerate losing a member. You put a load balancer in front of the API servers and point controlPlaneEndpoint at it, which is why we set that field even now. And you join the additional control-plane nodes with kubeadm join --control-plane rather than as workers. For the load balancer itself, keep it vendor-neutral with something like kube-vip running inside the cluster, rather than reaching for a cloud load balancer and reintroducing exactly the dependency this whole approach avoids. Three control-plane nodes, an odd number for etcd quorum, is the standard production floor.

You do not need any of that to build and run the platform in this series. You need to know the door is there, and that walking through it is a documented procedure, not a rebuild.

§What Is Boring About This, and Why That Is Good

Look back at what just happened. A handful of apt commands, one declarative kubeadm init, one cilium install, two kubeadm joins, and a connectivity test to prove it. The shape of this has barely changed in years. The runtime moved from Docker to containerd, the package repo moved, the CNI got better and absorbed the service proxy and the encryption overlay, and the manual WireGuard chapter collapsed into two flags. The core, stand up nodes, init a control plane, add a network, join workers, is the same dependable procedure it has been for a long time.

That is the point. This is the boring, settled layer you want under everything else. It is portable, it owes nothing to any provider, it is yours, and an agent with this post in front of it can stand the whole thing up, verify it, and tend it without you babysitting each step. Boring is not a complaint here. Boring is what a foundation is supposed to be.

Next we give the cluster somewhere to keep data: persistent storage with Rook and Ceph, so the stateful services in the rest of the series, the databases, the object store, the search cluster, have a durable home that travels with the platform instead of with the provider.

Craig Johnston · 2026-06-21 ← back to all notes

Build a Kubernetes Cluster You Own

§Custom, Not Managed, on Purpose

§The Shape of What We Are Building

§What You Need

§Open the Right Ports

§Prepare Each Node

§Containerd, Not Docker

§Install kubeadm, kubelet, and kubectl

§A Declarative Control Plane

§Cilium: Networking, kube-proxy Replacement, and Encryption

§Join the Workers

§Verify the Cluster Actually Works

§Reach It From Your Workstation

§When Something Is Wrong

§Owning the Cluster Means Operating It

§Back Up etcd

§Add and Remove Nodes

§Upgrade on Your Schedule

§One Control Plane, and When to Add More

§What Is Boring About This, and Why That Is Good

Webmentions