Preventing Container Image Deletion by Kubelet GC
This article outlines how to preserve container images on nodes, bypassing the Kubelet image garbage collection (GC) phase.
Use Cases
- In certain deployment environments (airgap), images cannot be fetched over the network. Hence, container images must be preloaded into nodes. Containers are deployed using the “ImagePullPolicy:Never” approach, requiring related images to be stored on nodes without being reclaimed by Kubelet GC.
- Developers use KIND to set up Kubernetes nodes for testing. Container images are loaded from nodes instead of being fetched from the internet. For long-term use, specific images need to be retained and not subjected to GC.
Kubelet GC
As described in the Kubernetes official documentation Garbage Collection, Kubernetes performs image GC through the image manager inside Kubelet. The execution of image GC is controlled by two parameters, HighThresholdPercent
and LowThresholdPercent
. When the system's disk usage exceeds HighThresholdPercent
, GC is triggered. It gradually deletes images based on their last usage time until the system usage falls below LowThresholdPercent
.
The default values for these two parameters are 85/80, as shown in the following code snippet.
// imageGCHighThresholdPercent is the percent of disk usage after which
// image garbage collection is always run. The percent is calculated by
// dividing this field value by 100, so this field must be between 0 and
// 100, inclusive. When specified, the value must be greater than
// imageGCLowThresholdPercent.
// Default: 85
// +optional
ImageGCHighThresholdPercent *int32 `json:"imageGCHighThresholdPercent,omitempty"`
// imageGCLowThresholdPercent is the percent of disk usage before which
// image garbage collection is never run. Lowest disk usage to garbage
// collect to. The percent is calculated by dividing this field value by 100,
// so the field value must be between 0 and 100, inclusive. When specified, the
// value must be less than imageGCHighThresholdPercent.
// Default: 80
// +optional
ImageGCLowThresholdPercent *int32 `json:"imageGCLowThresholdPercent,omitempty"`
However, the content of this document does not explain the detailed process, and there are hidden implementation details within the code. Therefore, we will proceed to learn the overall process by reading the code.
Implementation
As mentioned in the documentation, the ImageManager inside Kubelet is responsible for handling the GC process. Therefore, from the code inside Kubelet, it can be seen that GarbageCollect
function from ImageManager is called every five minutes to process the GC.
const (
...
// ImageGCPeriod is the period for performing image garbage collection.
ImageGCPeriod = 5 * time.Minute
...
)
...
go wait.Until(func() {
ctx := context.Background()
if err := kl.imageManager.GarbageCollect(ctx); err != nil {
if prevImageGCFailed {
klog.ErrorS(err, "Image garbage collection failed multiple times in a row")
// Only create an event for repeated failures
kl.recorder.Eventf(kl.nodeRef, v1.EventTypeWarning, events.ImageGCFailed, err.Error())
} else {
klog.ErrorS(err, "Image garbage collection failed once. Stats initialization may not have completed yet")
}
prevImageGCFailed = true
} else {
var vLevel klog.Level = 4
if prevImageGCFailed {
vLevel = 1
prevImageGCFailed = false
}
klog.V(vLevel).InfoS("Image garbage collection succeeded")
}
}, ImageGCPeriod, wait.NeverStop)
...
Inside the GarbageCollect
function, there is a judgment for HighThresholdPercent
in the code. When the current usage exceeds this standard, it calculates how much space needs to be removed to fall below LowThreholdPercent
and calls the internal freeSpace
function to reclaim the space.
...
usagePercent := 100 - int(available*100/capacity)
if usagePercent >= im.policy.HighThresholdPercent {
amountToFree := capacity*int64(100-im.policy.LowThresholdPercent)/100 - available
klog.InfoS("Disk usage on image filesystem is over the high threshold, trying to free bytes down to the low threshold", "usage", usagePercent, "highThreshold", im.policy.HighThresholdPercent, "amountToFree", amountToFree, "lowThreshold", im.policy.LowThresholdPercent)
freed, err := im.freeSpace(ctx, amountToFree, time.Now())
if err != nil {
return err
}
if freed < amountToFree {
err := fmt.Errorf("Failed to garbage collect required amount of images. Attempted to free %d bytes, but only found %d bytes eligible to free.", amountToFree, freed)
im.recorder.Eventf(im.nodeRef, v1.EventTypeWarning, events.FreeDiskSpaceFailed, err.Error())
return err
}
}
...
freeSpace first calls detectImages
to fetch information about currently running images in the system. It then runs a loop to filter out unnecessary images. Finally, it sorts the images based on their last usage and iterates through them to remove images until enough space is available
func (im *realImageGCManager) freeSpace(ctx context.Context, bytesToFree int64, freeTime time.Time) (int64, error) {
imagesInUse, err := im.detectImages(ctx, freeTime)
if err != nil {
return 0, err
}
// Get all images in eviction order.
images := make([]evictionInfo, 0, len(im.imageRecords))
for image, record := range im.imageRecords {
if isImageUsed(image, imagesInUse) {
klog.V(5).InfoS("Image ID is being used", "imageID", image)
continue
}
// Check if image is pinned, prevent garbage collection
if record.pinned {
klog.V(5).InfoS("Image is pinned, skipping garbage collection", "imageID", image)
continue
}
images = append(images, evictionInfo{
id: image,
imageRecord: *record,
})
}
sort.Sort(byLastUsedAndDetected(images))
...
The filtering criteria mentioned here provide additional details not explained in the document, outlining which images will be ignored by GC:
- Images currently in use.
- Images with the
pinned
attribute set.
Upon observing the concept of pinned
, a description in the sig-node/2040-kubelet-cri document explains:
Introduce field in the Image message to indicate an image should not be garbage collected.
This functionality was implemented in this PR around 2021.
Additionally, related implementations can be seen in the Containerd repository’s Issue. This feature was eventually released in Containerd 1.7. Users can pin images using the following commands:
sudo ctr -n k8s.io images label docker.io/library/jenkins:2.60.1 io.cri-containerd.pinned=pinned
sudo ctr -n k8s.io images pull --label=io.cri-containerd.pinned=pinned docker.io/library/jenkins:2.60.1
Environment
With the above concepts in mind, the next step is to set up a Kubernetes environment to validate the aforementioned concepts.
$ kubectl version
Client Version: v1.28.2
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.28.2
$ ctr --version
ctr github.com/containerd/containerd v1.7.6
$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 23.04
Release: 23.04
Codename: lunar
Experiments
Firstly, interact with containerd using the ctr
command and observe the related image statuses. Kubernetes defaults to using the "k8s.io" namespace, so the command needs to include "-n k8s.io".
Using images ls
to inspect all image statuses, it can be observed that the pause series of images will have the "io.cri-containerd.pinned=pinned" option by default, while other images do not.
$ sudo ctr -n k8s.io image ls
...
registry.k8s.io/pause:3.6 application/vnd.docker.distribution.manifest.list.v2+json sha256:3d380ca8864549e74af4b29c10f9cb0956236dfb01c40ca076fb6c37253234db 294.7 KiB linux/amd64,linux/arm/v7,linux/arm64,linux/ppc64le,linux/s390x,windows/amd64 io.cri-containerd.image=managed,io.cri-containerd.pinned=pinned
registry.k8s.io/pause@sha256:3d380ca8864549e74af4b29c10f9cb0956236dfb01c40ca076fb6c37253234db application/vnd.docker.distribution.manifest.list.v2+json sha256:3d380ca8864549e74af4b29c10f9cb0956236dfb01c40ca076fb6c37253234db 294.7 KiB linux/amd64,linux/arm/v7,linux/arm64,linux/ppc64le,linux/s390x,windows/amd64 io.cri-containerd.image=managed,io.cri-containerd.pinned=pinned
docker.io/calico/cni:v3.26.1 application/vnd.docker.distribution.manifest.list.v2+json sha256:3be3c67ddba17004c292eafec98cc49368ac273b40b27c8a6621be4471d348d6 89.0 MiB linux/amd64,linux/arm/v7,linux/arm64,linux/ppc64le,linux/s390x io.cri-containerd.image=managed
docker.io/calico/cni@sha256:3be3c67ddba17004c292eafec98cc49368ac273b40b27c8a6621be4471d348d6 application/vnd.docker.distribution.manifest.list.v2+json sha256:3be3c67ddba17004c292eafec98cc49368ac273b40b27c8a6621be4471d348d6 89.0 MiB linux/amd64,linux/arm/v7,linux/arm64,linux/ppc64le,linux/s390x io.cri-containerd.image=managed
...
Next, attempt to pin an image using the ctr
command.
ctr -n k8s.io images label xxxxxxxx io.cri-containerd.pinned=pinned
Due to the system’s limited disk space (30GB), a script is used to download different images and attempt to exceed the HighThresholdPercent (85%) to trigger relevant information
kubectl run a --image=docker.io/library/node:bullseye [1851/4810]
kubectl run a1 --image=docker.io/library/node:current-bookworm
kubectl run a2 --image=docker.io/library/node:bookworm
kubectl run a3 --image=docker.io/library/node:current
kubectl run a4 --image=docker.io/library/node:20.8.0-bullseye
kubectl run a5 --image=docker.io/library/openjdk:22-oracle
kubectl run a6 --image=docker.io/library/jenkins:2.60.1
kubectl run a7 --image=docker.io/library/jenkins:2.60.2
kubectl run a8 --image=docker.io/library/jenkins:2.60.3
kubectl run a9 --image=docker.io/library/jenkins:2.46.2
kubectl run a10 --image=docker.io/library/jenkins:2.46.1
kubectl run a11 --image=docker.io/pytorch/pytorch:latest
kubectl run a12 --image=docker.io/pytorch/pytorch:2.0.1-cuda11.7-cudnn8-devel
Additionally, in the deployed environment, the kubelet’s configuration file is customized to open its log level for more operational logs.
Next, observe the kubelet commands and inspect the relevant logs when the disk space exceeds the HighThresholdPercent.
$ sudo journalctl -f -u kubelet | grep image_gc
Below are the relevant log excerpts after removing unnecessary information for readability.
image_gc_manager.go:340] "Attempting to delete unused images"
...
image_gc_manager.go:255] "Adding image ID to currentImages" imageID="sha256:112170efb091e6c02eac19703986e3c59ce11e86
b826c1d70a4a4a73a333339b"
image_gc_manager.go:272] "Image ID has size" imageID="sha256:112170efb091e6c02eac19703986e3c59ce11e86b826c1d70a4a4a7
3a333339b" size=366064122
image_gc_manager.go:275] "Image ID is pinned" imageID="sha256:112170efb091e6c02eac19703986e3c59ce11e86b826c1d70a4a4a
73a333339b" pinned=true
...
image_gc_manager.go:364] "Image ID is being used" imageID="sha256:c62308471249574d567c4fff9a927451ac999f50fe9190ceb50e9949922762ef"
image_gc_manager.go:364] "Image ID is being used" imageID="sha256:677ad13d73108d775aec52e9bd38c33042ad14bb3a780b67613b8eb7be5de5b2"
image_gc_manager.go:369] "Image is pinned, skipping garbage collection" imageID="sha256:6270bb605e12e581514ada5fd5b3216f727db55dc87d5889c790e4c760683fee"
image_gc_manager.go:364] "Image ID is being used" imageID="sha256:8065b798a4d6729605e3706c202db657bfbcb8109127ece6af5bfb6da106adb7"
From the above logs, it appears to be working correctly. However, upon closer inspection, it’s noticed that all manually pinned images were not detected with “pinned=true”. Only the default pause image was detected.
When observing with the crictl
command, it is found that Kubernetes does not recognize the image as marked "pinned".
$ sudo crictl inspecti docker.io/library/jenkins:2.60.2
{
"status": {
"id": "sha256:112170efb091e6c02eac19703986e3c59ce11e86b826c1d70a4a4a73a333339b",
"repoTags": [
"docker.io/library/jenkins:2.60.2"
],
"repoDigests": [
"docker.io/library/jenkins@sha256:5d628badc50487581da2b4cb95a7589fe1d39922391e128f6a031273ad351b71"
],
"size": "366064122",
"uid": null,
"username": "jenkins",
"spec": null,
"pinned": false
},
After repeated experiments, it was observed that labels added using ctr image label
do not seem to be recognized as pinned. Only labels added through ctr image pull
are officially recognized.
Additionally, the issue where kubelet cannot recognize pinned images is an implementation bug. This bug has been fixed in the PR Pass Pinned field to kubecontainer.Image, and the fix is expected to be released in v1.29.
To validate this, an attempt was made to download version v1.29.0-alpha.1 of kubelet and replace the existing version. In the end, the entire functionality worked as expected, successfully skipping pinned images.
Summary
Finally, the entire process is summarized in the following diagram: