How to Simulate a Multi-Nodes Kubernetes Cluster Using Kubemark

HungWei Chiu
10 min readSep 8, 2024

Preface
Kubemark is an official tool that allows cluster administrators to scale the number of nodes in a Kubernetes (K8s) cluster through simulation. It can generate hundreds to thousands of nodes, aiding developers in understanding the performance and limitations of their applications.

The core of Kubemark involves packaging Kubelet into containers. By deploying numerous Pods, a large number of nodes can be simulated. These nodes are then connected to the target Kubernetes cluster, enabling the registration of numerous Kubernetes nodes.

Prepare Kubemark
Since Kubemark simulates different versions of Kubernetes, it’s recommended to have corresponding versions for each Kubernetes release. The simplest way to lock versions is by constructing the corresponding files from the official Kubernetes versions.

Below, we’ll compile two versions for K8s 1.30 and 1.31, containerize them, and upload them to DockerHub for future use.

The steps are as follows:

  1. Download the relevant code from the official Kubernetes repository.
  2. Switch to the corresponding branch.
  3. Build Kubemark.
  4. Package Kubemark into a container.
  5. Upload it to DockerHub.

First, all the code is hosted on the official GitHub repository. You can clone it and switch to the target branch via git checkout.

git clone https://github.com/kubernetes/kubernetes.git
cd kubernetes
git checkout v1.31.0

Ensure that golang and the make command are available in your environment.

make WHAT='cmd/kubemark/'

After running the above process, a compiled file kubemark will be generated in _output/local/bin/linux/amd64/. Copy the file to another directory for containerization.

cp _output/local/bin/linux/amd64/kubemark cluster/images/kubemark/
cd cluster/images/kubemark/
make REGISTRY=hwchiu IMAGE_TAG=v1.31.0 push

Repeat the process for versions v1.31.0 and v1.30.4 to generate the following two images.

Docker Hub Image

Additionally, in the current Kubemark implementation, the CR is hardcoded to "unix:///run/containerd/containerd.sock":

func GetHollowKubeletConfig(opt *HollowKubeletOptions) (*options.KubeletFlags, *kubeletconfig.KubeletConfiguration) {
.....
c, err := options.NewKubeletConfiguration()
if err != nil {
panic(err)
}
c.ImageServiceEndpoint = "unix:///run/containerd/containerd.sock"
...

This means that the cluster must provide a containerd Unix socket to use it. If you’re using cri-o, it won't work.

Therefore, I also modified the code to support cri-o and built an additional version for testing purposes.

Environment

To keep the test environment simple and clean, two Kubernetes clusters (version v1.31.0) are deployed, with the following roles:

  • Cluster1(Node K8s Cluster): This cluster is used to deploy Kubemark and is typically referred to as the “Hollow Node Cluster.” All the Kubemark Pods in this cluster will be registered as Kubernetes nodes in another cluster.
  • Cluster2(Testing K8s Cluster): This is the target cluster, where the Kubemark nodes will register. Ultimately, it should display a large number of simulated Kubernetes nodes.

The architecture is as follows:

A Hollow-Node is a regular Pod that contains two containers, each playing the role of a Kubelet and a Kube-Proxy.

Installation

To register a node with the API Server, it must be able to communicate with the API Server. The simplest way to achieve this is by obtaining the Kubeconfig for the target cluster. This Kubeconfig will be installed into Kubernetes and mounted to each Hollow-Node as a Secret.

Assuming the Kubeconfig is named cluster_config, run the following commands:

kubectl create ns kubemark
kubectl create secret generic kubeconfig --type=Opaque --namespace=kubemark --from-file=kubelet.kubeconfig=cluster_config --from-file=kubeproxy.kubeconfig=cluster_config

Next, prepare a YAML file to deploy the Hollow Nodes. This file will deploy multiple Kubemark Pods, each containing:

  • An Init Container to set the fs inotify limit for the node, likely for Kubelet/Kubeproxy logs.
  • Two main containers represent the Kube-Proxy and Kubelet.

The example uses a ReplicationController, but you can switch it to a Deployment if desired.

apiVersion: v1
kind: ReplicationController
metadata:
name: hollow-node
namespace: kubemark
labels:
name: hollow-node
spec:
replicas: 100
selector:
name: hollow-node
template:
metadata:
labels:
name: hollow-node
spec:
initContainers:
- name: init-inotify-limit
image: busybox:1.32
command: ['sysctl', '-w', 'fs.inotify.max_user_instances=1000']
securityContext:
privileged: true
volumes:
- name: kubeconfig-volume
secret:
secretName: kubeconfig
- name: kernelmonitorconfig-volume
configMap:
name: node-configmap
- name: logs-volume
hostPath:
path: /var/log
- name: containerd
hostPath:
path: /run/crio
- name: no-serviceaccount-access-to-real-master
emptyDir: {}
containers:
- name: hollow-kubelet
image: hwchiu/kubemark:v1.31.0-crio-dev
ports:
- containerPort: 4194
- containerPort: 10250
- containerPort: 10255
env:
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
command: [
"/go-runner",
"-log-file=/var/log/kubelet-$(NODE_NAME).log",
"/kubemark",
"--morph=kubelet",
"--containerd=/run/crio/crio.sock",
"--name=$(NODE_NAME)",
"--node-labels=hollow-node=true",
"--kubeconfig=/kubeconfig/kubelet.kubeconfig",
]
volumeMounts:
- name: kubeconfig-volume
mountPath: /kubeconfig
readOnly: true
- name: logs-volume
mountPath: /var/log
- name: containerd
mountPath: /run/crio
resources:
requests:
cpu: 40m
memory: 100Mi
securityContext:
privileged: true
- name: hollow-proxy
image: hwchiu/kubemark:v1.31.0-crio-dev
imagePullPolicy: Always
env:
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
command: [
"/go-runner",
"-log-file=/var/log/kubeproxy-$(NODE_NAME).log",
"/kubemark",
"--morph=proxy",
"--name=$(NODE_NAME)",
"--v=9",
"--kubeconfig=/kubeconfig/kubeproxy.kubeconfig",
]
volumeMounts:
- name: kubeconfig-volume
mountPath: /kubeconfig
readOnly: true
- name: logs-volume
mountPath: /var/log
resources:
requests:
cpu: 40m
memory: 100Mi

The image used in the YAML file is "v1.31.0-crio-dev", which is my custom-built version. I'll explain the modifications in detail later.

Let’s take a quick look at the relevant configurations for the hollow-kubelet in the YAML file:

  • Replication is set to 100, meaning there are 100 Pods, which simulates 100 nodes.
  • Kubemark uses the morph command to determine which mode to run.
  • The -log-file flag is used to write logs for each node to the K8s node.
  • The containerd settings are configured for CAdvisor, not for Kubelet.
  • The --name parameter gets the Pod's name using the Downward API and sets it as the node's name.
  • Node labels are defined using --node-labels, which makes filtering and deleting nodes easier later on.
  • The --kubeconfig option uses the Kubeconfig from the previously installed secret.

For hollow-proxy:

  • Set to proxy mode via the morph command.
  • Logs are configured with the -log-file flag, similar to the kubelet setup.
  • The rest of the configuration is quite similar.

After deploying the YAML file, you should observe the following in the target Kubernetes cluster: multiple simulated nodes registered and running.

root@node-k8s-1:~# kubectl -n kubemark get pods
NAME READY STATUS RESTARTS AGE
hollow-node-26m9k 2/2 Running 2 18h
hollow-node-28bwz 2/2 Running 2 18h
hollow-node-2pmns 2/2 Running 2 18h
hollow-node-2vgjr 2/2 Running 2 18h
hollow-node-2zpr2 2/2 Running 2 18h
hollow-node-4xjkq 2/2 Running 2 18h
hollow-node-55blr 2/2 Running 2 18h
hollow-node-56r85 2/2 Running 2 18h
hollow-node-5dvqd 2/2 Running 2 18h
hollow-node-5jpkw 0/2 Pending 0 18h
hollow-node-5stkv 2/2 Running 2 18h
hollow-node-6h447 2/2 Running 2 18h
hollow-node-6nbcp 2/2 Running 2 18h
hollow-node-78z6f 2/2 Running 2 18h
hollow-node-7d5qg 2/2 Running 2 18h
hollow-node-7kr98 2/2 Running 2 18h
hollow-node-84wn9 2/2 Running 2 18h
hollow-node-87xm4 2/2 Running 2 18h
hollow-node-8d74t 2/2 Running 2 18h
hollow-node-94crg 2/2 Running 2 18h
hollow-node-9frdz 2/2 Running 2 18h
hollow-node-9kjzg 2/2 Running 2 18h
hollow-node-9zr8t 2/2 Running 2 18h
hollow-node-b8v6d 2/2 Running 2 18h
hollow-node-bhsx6 2/2 Running 2 18h
hollow-node-bnpjw 2/2 Running 2 18h
hollow-node-bzb6n 2/2 Running 2 18h
hollow-node-c799k 2/2 Running 2 18h
hollow-node-c9cph 2/2 Running 2 18h
hollow-node-c9lfr 2/2 Running 2 18h
hollow-node-cb4rp 2/2 Running 2 18h
hollow-node-cqhtj 2/2 Running 2 18h

Once all the Hollow-Node Pods are successfully running, the target (Testing K8s Cluster) will display numerous nodes being registered.

root@test-cluster-1:~# kubectl get nodes
NAME STATUS ROLES AGE VERSION
hollow-node-26m9k Ready <none> 18h v1.31.0-dirty
hollow-node-28bwz Ready <none> 18h v1.31.0-dirty
hollow-node-2pmns Ready <none> 18h v1.31.0-dirty
hollow-node-2vgjr Ready <none> 18h v1.31.0-dirty
hollow-node-2zpr2 Ready <none> 18h v1.31.0-dirty
hollow-node-4xjkq Ready <none> 18h v1.31.0-dirty
hollow-node-55blr Ready <none> 18h v1.31.0-dirty
hollow-node-56r85 Ready <none> 18h v1.31.0-dirty
hollow-node-5dvqd Ready <none> 18h v1.31.0-dirty
hollow-node-5stkv Ready <none> 18h v1.31.0-dirty
hollow-node-6h447 Ready <none> 18h v1.31.0-dirty
hollow-node-6nbcp Ready <none> 18h v1.31.0-dirty
hollow-node-78z6f Ready <none> 18h v1.31.0-dirty
hollow-node-7d5qg Ready <none> 18h v1.31.0-dirty
hollow-node-7kr98 Ready <none> 18h v1.31.0-dirty
hollow-node-84wn9 Ready <none> 18h v1.31.0-dirty
hollow-node-87xm4 Ready <none> 18h v1.31.0-dirty
hollow-node-8d74t Ready <none> 18h v1.31.0-dirty
hollow-node-94crg Ready <none> 18h v1.31.0-dirty
hollow-node-9frdz Ready <none> 18h v1.31.0-dirty
hollow-node-9kjzg Ready <none> 18h v1.31.0-dirty
hollow-node-9zr8t Ready <none> 18h v1.31.0-dirty
hollow-node-b8v6d Ready <none> 18h v1.31.0-dirty
hollow-node-bhsx6 Ready <none> 18h v1.31.0-dirty
hollow-node-bnpjw Ready <none> 18h v1.31.0-dirty
hollow-node-bzb6n Ready <none> 18h v1.31.0-dirty
hollow-node-c799k Ready <none> 18h v1.31.0-dirty
hollow-node-c9cph Ready <none> 18h v1.31.0-dirty
hollow-node-c9lfr Ready <none> 18h v1.31.0-dirty
hollow-node-cb4rp Ready <none> 18h v1.31.0-dirty
hollow-node-cqhtj Ready <none> 18h v1.31.0-dirty
hollow-node-csbrn Ready <none> 18h v1.31.0-dirty
hollow-node-czqcr Ready <none> 18h v1.31.0-dirty
hollow-node-d5xh2 Ready <none> 18h v1.31.0-dirty
hollow-node-d68nk Ready <none> 18h v1.31.0-dirty
hollow-node-dg5dj Ready <none> 18h v1.31.0-dirty
hollow-node-fbc45 Ready <none> 18h v1.31.0-dirty
hollow-node-g7ht9 Ready <none> 18h v1.31.0-dirty
hollow-node-g9g5p Ready <none> 18h v1.31.0-dirty
hollow-node-gdb68 Ready <none> 18h v1.31.0-dirty
hollow-node-gdxkn Ready <none> 18h v1.31.0-dirty
hollow-node-gm4nh Ready <none> 18h v1.31.0-dirty

At this point, if you inspect the nodes where these Pods are deployed, you can observe the related logs.

root@node-cluster-2:/var/log# ls /var/log/kube*
/var/log/kubelet-hollow-node-26m9k.log /var/log/kubelet-hollow-node-bzb6n.log /var/log/kubelet-hollow-node-ksw8w.log /var/log/kubelet-hollow-node-rshs2.log /var/log/kubeproxy-hollow-node-28bwz.log /var/log/kubeproxy-hollow-node-c799k.log /var/log/kubeproxy-hollow-node-l522t.log /var/log/kubeproxy-hollow-node-s25wv.log

If you remove all the Hollow-Pods, these nodes will lose their connection to the Kubelet and eventually enter the NodeNotReady state. To remove them, you’ll need to use the kubectl delete nodes command.

~# kubectl get nodes
NAME STATUS ROLES AGE VERSION
hollow-node-26m9k NotReady <none> 18h v1.31.0-dirty
hollow-node-28bwz NotReady <none> 18h v1.31.0-dirty
hollow-node-2pmns NotReady <none> 18h v1.31.0-dirty
hollow-node-2vgjr NotReady <none> 18h v1.31.0-dirty
hollow-node-2zpr2 NotReady <none> 18h v1.31.0-dirty
hollow-node-4xjkq NotReady <none> 18h v1.31.0-dirty
hollow-node-55blr NotReady <none> 18h v1.31.0-dirty
hollow-node-56r85 NotReady <none> 18h v1.31.0-dirty
hollow-node-5dvqd NotReady <none> 18h v1.31.0-dirty
hollow-node-5stkv NotReady <none> 18h v1.31.0-dirty
hollow-node-6h447 NotReady <none> 18h v1.31.0-dirty
hollow-node-6nbcp NotReady <none> 18h v1.31.0-dirty
hollow-node-78z6f NotReady <none> 18h v1.31.0-dirty
hollow-node-7d5qg NotReady <none> 18h v1.31.0-dirty
hollow-node-7kr98 NotReady <none> 18h v1.31.0-dirty
hollow-node-84wn9 NotReady <none> 18h v1.31.0-dirty
hollow-node-87xm4 NotReady <none> 18h v1.31.0-dirty
hollow-node-8d74t NotReady <none> 18h v1.31.0-dirty
hollow-node-94crg NotReady <none> 18h v1.31.0-dirty
hollow-node-9frdz NotReady <none> 18h v1.31.0-dirty

The labels mentioned earlier can simplify the node removal process.

kubectl delete nodes -l hollow-node=true

Issue

When running the original Kubemark, two primary issues arise:

  1. containerd Unix socket issue: This can be resolved by modifying pkg/kubemark/hollow_kubelet.go to point to the crio socket instead. If you're already using containerd, this modification is unnecessary.
--- a/pkg/kubemark/hollow_kubelet.go
+++ b/pkg/kubemark/hollow_kubelet.go
@@ -168,7 +168,7 @@ func GetHollowKubeletConfig(opt *HollowKubeletOptions) (*options.KubeletFlags, *
panic(err)
}

- c.ImageServiceEndpoint = "unix:///run/containerd/containerd.sock"
+ c.ImageServiceEndpoint = "unix:///run/crio/crio.sock"
c.StaticPodURL = ""
c.EnableServer = true
c.Address = "0.0.0.0" /* bind address */
~

2. After fixing the above issue, the Kubelet will function properly, but kube-proxy encounters a nil pointer issue, preventing it from running smoothly. You can observe this issue in the logs located at /var/log/kubeproxy-xxxx.log.

E0907 12:36:03.288062       6 runtime.go:79] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
goroutine 46 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic({0x366f260, 0x6154720})
k8s.io/apimachinery/pkg/util/runtime/runtime.go:75 +0x85
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc000826380?})
k8s.io/apimachinery/pkg/util/runtime/runtime.go:49 +0x6b
panic({0x366f260?, 0x6154720?})
runtime/panic.go:770 +0x132
k8s.io/kubernetes/pkg/proxy/healthcheck.(*ProxierHealthServer).SyncNode(0x0, 0xc000ae2308)
k8s.io/kubernetes/pkg/proxy/healthcheck/proxier_health.go:142 +0x6d
k8s.io/kubernetes/pkg/proxy.(*NodeEligibleHandler).OnNodeAdd(0xc00008b0d8, 0xc000ae2308)
k8s.io/kubernetes/pkg/proxy/node.go:102 +0x57
k8s.io/kubernetes/pkg/proxy/config.(*NodeConfig).handleAddNode(0xc000665440, {0x3da4c60?, 0xc000ae2308?})
k8s.io/kubernetes/pkg/proxy/config/config.go:339 +0x12a
k8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnAdd(...)
k8s.io/client-go/tools/cache/controller.go:239
k8s.io/client-go/tools/cache.(*processorListener).run.func1()
k8s.io/client-go/tools/cache/shared_informer.go:978 +0x13e
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x30?)
k8s.io/apimachinery/pkg/util/wait/backoff.go:226 +0x33
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc0004edf70, {0x4371540, 0xc000904000}, 0x1, 0xc000902000)
k8s.io/apimachinery/pkg/util/wait/backoff.go:227 +0xaf
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc000881770, 0x3b9aca00, 0x0, 0x1, 0xc000902000)
k8s.io/apimachinery/pkg/util/wait/backoff.go:204 +0x7f
k8s.io/apimachinery/pkg/util/wait.Until(...)
k8s.io/apimachinery/pkg/util/wait/backoff.go:161
k8s.io/client-go/tools/cache.(*processorListener).run(0xc0006dc630)
k8s.io/client-go/tools/cache/shared_informer.go:972 +0x69
k8s.io/apimachinery/pkg/util/wait.(*Group).Start.func1()
k8s.io/apimachinery/pkg/util/wait/wait.go:72 +0x52
created by k8s.io/apimachinery/pkg/util/wait.(*Group).Start in goroutine 187
k8s.io/apimachinery/pkg/util/wait/wait.go:70 +0x73
I0907 12:36:03.288125 6 config.go:120] "Calling handler.OnEndpointSliceAdd" endpoints="default/kubernetes"
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x30fea8d]

The following are my personal findings and hypotheses, which have not yet been confirmed.

By examining the code, it appears that when kube-proxy runs, it tries to register an Event Handler. However, in the Kubemark process, the object HealthServer is uninitialized. As a result, when an event occurs and the handler is called, the invalid memory dereference error is triggered.

if utilfeature.DefaultFeatureGate.Enabled(features.KubeProxyDrainingTerminatingNodes) {
nodeConfig.RegisterEventHandler(&proxy.NodeEligibleHandler{
HealthServer: s.HealthzServer,
})
}

Normally, when kube-proxy initializes, this variable is properly initialized.

if len(config.HealthzBindAddress) > 0 {
s.HealthzServer = healthcheck.NewProxierHealthServer(config.HealthzBindAddress, 2*config.SyncPeriod.Duration)
}

In Kubemark, however, this initialization process is skipped, and the corresponding object is directly instantiated.

return &HollowProxy{
ProxyServer: &proxyapp.ProxyServer{
Config: &proxyconfigapi.KubeProxyConfiguration{
Mode: proxyconfigapi.ProxyMode("fake"),
ConfigSyncPeriod: metav1.Duration{Duration: 30 * time.Second},
Linux: proxyconfigapi.KubeProxyLinuxConfiguration{
OOMScoreAdj: ptr.To[int32](0),
},
},

Client: client,
Proxier: &FakeProxier{},
Broadcaster: broadcaster,
Recorder: recorder,
NodeRef: &v1.ObjectReference{
Kind: "Node",
Name: nodeName,
UID: types.UID(nodeName),
Namespace: "",
},
},
}

From the event registration code, it appears that a feature gate(KubeProxyDrainingTerminatingNodes) must be enabled in the environment for this process to execute. This feature gate was previously in alpha but was promoted to beta in version 1.30 and is now enabled by default. There are a few ways to resolve this issue:

  1. Disable the feature gate to prevent unnecessary registration.
  2. Ignore the registration process.

Since my test environment was already modified to support crio, I opted for option 2 and commented out the related code. As a result, the custom image named v1.31.0-crio-dev was used, which includes these modifications.

--- a/cmd/kube-proxy/app/server.go
+++ b/cmd/kube-proxy/app/server.go
@@ -564,11 +564,14 @@ func (s *ProxyServer) Run(ctx context.Context) error {
if s.Config.DetectLocalMode == kubeproxyconfig.LocalModeNodeCIDR {
nodeConfig.RegisterEventHandler(proxy.NewNodePodCIDRHandler(ctx, s.podCIDRs))
}
+
+ /*
if utilfeature.DefaultFeatureGate.Enabled(features.KubeProxyDrainingTerminatingNodes) {
nodeConfig.RegisterEventHandler(&proxy.NodeEligibleHandler{
HealthServer: s.HealthzServer,
})
}
+ */

From my understanding, this issue should not occur in environments using version 1.29.0, though I have not yet verified this on v1.29.0.

Summary

  • Kubemark is a tool that allows Kubernetes Pods to register as Kubernetes nodes.
  • Each Pod runs two containers: one for the Kubelet and another for the kube-proxy.
  • Kubemark is intended to verify the scalability of controllers and other related tools. Since the nodes are simulated, overly complex CNIs are likely to encounter many issues, making it unsuitable for testing network performance and applications.
  • Kubemark does not have an official image, so users must download and compile it themselves.

--

--