Kubernetes Network Troubleshooting Approach
Introduction
The design and user interface of Kubernetes architecture allow users to easily deploy applications to so-called multi-node environments. Specifically, the network portion is simplified through an easy-to-use abstraction layer that streamlines all underlying packet flow and operations, making it easier and faster for users to access the deployed Kubernetes applications.
This article will briefly explain the network traffic within Kubernetes and discuss what approach to take when facing network issues. We will tackle each problem step by step to find the root cause of the issue.
Kubernetes Networking
Kubernetes is a multi-node cluster system, and when it comes to packet flow, it can generally be divided into east-west and north-south directions, viewed from a cluster-based perspective.
North-South Traffic
North-south traffic refers to traffic that enters and exits the cluster, with one end of the packet’s source or destination not belonging to the cluster.
There may be several types of traffic, including:
External client access cluster services
- Ingress
- API Gateway
- Load Balancer
- … and so on
Cluster service access out cluster service
- NAT
- Internet Gateway
- … and so on.
The following diagram shows a simple representation of north-south traffic.
While this type of diagram can only provide a simple representation of packet flow and give users a basic understanding of the flow within the entire cluster, it is not sufficient for debugging. Therefore, when troubleshooting network issues, it is necessary to be able to describe the components involved in more detail, as shown in the following diagram.
For example, if a Kubernetes cluster is configured with an external load balancer, the load balancer sends packets to the nodes and uses the Service(Node-Port) method to send the packets to the target Pod.
The target Pod relies on the routing table to forward packets to the NAT gateway, which handles SNAT and forwards packets to the external network.
In addition, the following diagram shows another type of underlying implementation.
In this example, the Load-Balancer and Kubernetes Pod have a flat network, such as AWS CNI, Azure CNI. Therefore, the Load-Balancer can directly access the Pod without going through any Service(LB/NodePort).
Each node relies on its own NAT service to handle SNAT directly and send the network directly to the external network.
In this architecture, the external L4 Load Balancer may direct all traffic to the Kubernetes Ingress Controller, allowing the Ingress to handle L7 processing and forwarding.
At the same time, the environment structure includes both internal and public networks, and nodes determine the direction of packets based on the destination, using routing tables.
All three examples can achieve the effect of the initial simple diagram, but their underlying implementations are vastly different. Therefore, the first step in network debugging is to have the ability to systematically describe the components involved in the network packet flow, and to understand the process and related components before proceeding with further troubleshooting.
East-West Traffic
East-west traffic refers to packet traffic that traverses within nodes, with both the source and destination ends belonging to the cluster, such as different Pods or nodes themselves.
Access direction:
- Pod ←> Service
- Pod ←> Pod
- Pod ←> Node
Access range:
- Traffic in the same node (source and destination are in the same node)
- Across nodes (source and destination are in the different node)
For east-west traffic, the simplest scenario is access between Pods.
However, most applications will use deployment and service to provide easier management, as shown in the following diagram.
Based on the concept of K8s Service, all packets sent to the Service will rely on the Kube-proxy configuration to handle load balancing decisions (iptables, ipvs).
From the above discussion, it can be understood that there is no universal network architecture diagram in the network world. Different environments and scenarios will have different network flows. Therefore, the basic principles of troubleshooting network issues are:
Clarify who is the sender and who is the receiver. Clarify the positioning of the sender and receiver in relation to Kubernetes. Clarify all the components involved in the packet flow.
Kubernetes Network Components
I believe the K8s network architecture can be divided into four aspects, which are integrated with each other to provide comprehensive network functionality. However, if any one of them goes wrong, the entire network may not function as expected. These aspects are:
- Underlying infrastructure
- Built-in network functionality of Kubernetes
- CNI
- Integration of third-party solutions.
Underlying Infrastructure
For cloud users, this part of the configuration relies on the cloud service provider to complete, while users pay to build it, such as:
- VPC
- Subnet
- Firewall
- Routing
- NAT/Internet GW
However, for on-premises environment, these are not resources that can be generated with a single click or Terraform. They require the actual deployment of machines, switch, power supply and whole lab management, such as:
- Network connections between nodes, using L2 Switch, VLAN, ..etc.
- IP allocation, whether it is a static IP or obtained dynamically via DHCP
- Deployment and management of DNS servers and certificates.
- Switches/Routers across multiple racks
The architecture may look like the following.
Built-in Network Functionality from Kubernetes
Kubernetes has built-in network resources, including:
- Kubernetes Service: This part mainly depends on the implementation of kube-proxy, which can use the default iptables or change to ipvs. The implementation of load balancing algorithms is also different.
- Kubernetes Ingress: Kubernetes only provides a simple interface, and the implementation depends on which Ingress Controller is installed. Different implementations have different details, such as Nginx, Kong, Traefik, etc.
- CoreDNS: Used to handle basic DNS requests, all internal k8s service DNS will be resolved and handled by CoreDNS. Some network environments also want to integrate with external DNS.
- Network Policy: A firewall rule for Pods. This is also a simple interface, and the implementation is completed by the underlying CNI.
After integrating the above concepts into the previous figure, the possible architecture may look like the following.
CNI (Container Network Interface)
CNI is mainly used to handle:
- IP allocation for Pods (IPAM)
- Packet handling between Pods across nodes.
A simple concept is how the private IP (Pod) on each node is handled with the private IP (Pod) on other nodes, how to make the connectivity work?
Currently, different CNIs adopt different networking technologies, such as:
- Calico (BGP/IPIP)
- Flannel (VXLAN)
- Cilium (eBPF)
- OVS (OpenFlow)
- Cloud-Provider specified (AWS/Azure)
Third-Party Integration
The third-party integration solutions include additional features such as Service Mesh and Cluster Federation.
These features build on top of a “fully operational” Kubernetes system and provide advanced network processing.
However, adding advanced network functionality also means a more complex architecture.
Without a grasp of these concepts and principles, it can be challenging for a “YAML engineer” who follows the README.MD to customize, troubleshoot, and adjust the architecture according to their needs.
For example, after building Cluster Federation, it could become as follows:
Debugging Mindset for Kubernetes Networking
As explained by the basic concepts above, networking may seem simple but in reality, there are many components involved, especially when more and more network features are installed in the environment. Therefore, the recommended approach when encountering network issues is:
- Clarify the directionality, whether the problem is in the North-South or East-West direction?
- Identify the point where the problem occurs, which layer does the problem belong to?
Is there an issue with the infrastructure? Are the built-in Kubernetes features not configured properly?
Is there a problem with the CNI or third-party integrated services?
It is extremely important not to debug network problems by just thinking. Each person has different concepts and background knowledge of the network, so it is sometimes difficult to have a common understanding and consensus through talking. The best approach is to draw a diagram to clarify and narrow down the problem occurrence point.
To effectively implement the above approach, a method can be used:
- Draw the entire system architecture diagram
- Mark your network situation, who is the sender, who is the receiver?
- Imagine yourself as a packet and explain how the packet will flow through the entire architecture diagram If there is a part that cannot be explained, it means that you are not familiar enough with the network architecture, and you should continue to study.
- Based on the above process, begin debugging to narrow down the possible range of problems, and then debug the components that may be causing the problem in the narrowed range. Repeat the entire process until the problem occurrence point is identified.
Here is an example of “My Pod cannot access the target Pod through the Service”:
A simple diagram can be drawn as follows:
However, this diagram only provides a basic description of the packet flow, and there are still some unclear areas for troubleshooting. At this point, if the diagram can be expanded with more technical details, the following diagram can be obtained.
- The Pod attempts to access the service via DNS lookup.
- The Pod checks the /etc/resolve file to find the DNS IP.
- The actual DNS IP is the CoreDNS Cluster ServiceIP.
- The DNS request is sent to CoreDNS to resolve the Service ClusterIP.
- The Pod sends the request to the ClusterIP and allows Kubernetes to forward it to the target Pod.
However, the above diagram is not 100% accurate and some network details are omitted, such as:
- CoreDNS is deployed based on the Hostnetwork, so the Pod -> CoreDNS part becomes the Pod -> Node access method.
- The Pod -> Service ClusterIP involves iptables/ipvs forwarding, so the actual traffic does not follow a single Pod -> Service path. Instead, the node itself performs DNAT to find a suitable Pod IP and then sends it directly to the target Pod.
Even a simple Pod -> Service connection involves many details. Most of the time, these components work well, and everyone’s network is working fine. However, if one small component fails, the entire network will fail.
When understanding the technical details mentioned above, this Pod -> Service problem can be approached in the following ways:
- Is it related to DNS resolution? Try directly using ClusterIP to test the connection.
- Is it related to Service conversion? Try connecting directly using PodIP.
- Is it related to the node? Try connecting to other Pods on the same node.
- Is it related to the sender? Try to test from the node.
- Is there a Network Policy blocking the connection?
If none of the above helps to narrow down the problem, try capturing packets from different points to analyze the issue:
- Server did not receive the packet.
- Server received the packet but did not respond.
- Server received and responded, but the client did not receive.
Additionally, consider whether the packet may have been dropped by the kernel or if there is a problem with the underlying network, such as a faulty network cable.
Recording Packets
Recording packets can be a useful tool in troubleshooting network issues, but it requires the problem to be reproducible in order to capture useful information.
When deciding to record packets, two questions need to be answered: what tool to use and how to locate the target packet in the sea of traffic.
Common tools such as Wireshark, tcpdump, and tshark can all be used to record packets, but it’s important to note that some environments may not have a GUI for running Wireshark, so familiarity with CLI tools is a valuable skill.
Once a tool is chosen, the question becomes who will run it?
- Pod
Recording packets within a pod depends on the container image and whether or not it has tcpdump installed. - Node
Alternatively, recording packets on the node can capture most of container packets, and the node is more convenient for installing various debugging tools.
However, if nodes are dynamically installed through auto-scaling groups, for example, installing debugging tools each time can be cumbersome.
You mat not able to capture packets from node if you’re using something like DPDK, SR-IOV…etc
When recording packets, CNI and the underlying infrastructure must also be taken into account.
For example, with Calico, packets may traverse multiple nodes due to Calico’s distributed architecture.
For example, if using Calico, the packets are processed between nodes using the IP-IP tunneling protocol.
Therefore, if you capture packets at this point, you will not see simple IP protocols, but rather IP-IP protocols. Without this network knowledge and understanding, you will not be able to find the information you need even if you capture packets.
If you want to capture packets on the node and the CNI uses veth to forward packets to the container, if you can find the mapping between each veth and the Pod, you can directly capture packets on the veth to find the most direct packets going to and from the Pod, and this is the most efficient way to debug.
In addition to the tools related to Kubernetes, Linux’s own network tools are also essential, such as
- ip/tcpdump
- conntrack
- iptables/ipvs
- ethtool
- routing, NAT, rp_filter
- etc
These tools may affect packet forwarding at the node level, and if not used correctly, may cause network traffic to be blocked.
In summary,
- There are numerous network components in Kubernetes, and any issues with a single component may lead to unexpected network results. Troubleshooting can be very challenging without sufficient background knowledge and skills.
- Drawing a diagram of the architecture can clarify all packet flows.
- Performing thorough analysis and debugging can narrow down the scope of the problem.
- Repeating the above processes can eventually identify the point where the problem occurred.