Docker Networking Model(2) — Bridge Mode
Preface
In the previous article, Docker Networking Model, we shared several basic network models inside Docker, from None, Host, Bridge to Container sharing.
In this article, we will continue to explore and our ultimate goal is to understand how containers have access to external networks.
Introduction
In fact, the network models we introduced in the previous article can be converted from one to another as long as you are familiar with the relevant commands and usage on Linux. Today, we will explore how to create a Bridge network model, and we will use multiple containers based on this network model to ensure that they can access each other.
In this article, we will not discuss how containers access the external network, such as accessing the Google DNS server. Instead, we will focus on the underlying architecture and access between containers on the same network segment.
Environment To better demonstrate the changes throughout the process, we will first create a completely clean container through the None network model. Then, we will use Linux commands to transform it into a Bridge network model.
Finally, we will create two containers based on the above steps, both using the Bridge network model, so that these two containers can communicate with each other, but still do not have the ability to access the external network.
Steps
The following steps will be accompanied by illustrations similar to the previous article, observing the changes that each step brings to the system from different perspectives. At the same time, each step will be accompanied by relevant code, and those who are interested can also test it in their own environment.
$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 18.04.3 LTS
Release: 18.04
Codename: bionic
$ uname -a
Linux k8s-dev 4.15.0-72-generic #81-Ubuntu SMP Tue Nov 26 12:20:02 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
$ docker --version
Docker version 19.03.13, build 4484c46d9d
Creating two None containers
First, we need to create two clean containers, and our ultimate goal is to transform them into a Bridge network model and ensure that these two containers can access each other.
Here we will use the form of “--network=none” to request that Docker not interfere with this network model and let us handle it ourselves.
When creating the containers, we will specifically give the parameter “ — privileged” to request special privileges, which will be explained later. For now, let’s just use it.
We will create two containers named c1 and c2 in the system. Afterwards, we will use Docker commands to confirm that there are no other network interfaces besides lo inside these two containers.
$ docker run --privileged -d --network=none --name c1 hwchiu/netutils
$ docker run --privileged -d --network=none --name c2 hwchiu/netutils
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
868f54ee0b32 hwchiu/netutils "/bin/bash ./entrypo…" 11 minutes ago Up 11 minutes c2
5df4ed8e756a hwchiu/netutils "/bin/bash ./entrypo…" 11 minutes ago Up 11 minutes c1
$ docker exec c1 ifconfig
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
UP LOOPBACK RUNNING MTU:65536 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
Networking/System View
Left Image:
This introduces the network from a system-level perspective. The gray line in the middle divides it into the upper part of UserSpace and the lower part of Kernel Space.
In this example, different color changes in the kernel space represent different network namespace, and each network namespace is isolated from each other.
Right Image:
This provides a more concise introduction and mainly observes the network from the user’s perspective, explaining what changes in the relationship between network components from the example.
After understanding the above concepts, let’s take a look at how to understand this picture:
When we create two containers using None, two new netns (Network Namespace) will be created in the system, as shown in the yellow and light green colors in the figure above. The light blue color represents the native host machine. By default, only the default network interface lo will be present inside these netns. Assuming that eth0 is the network interface owned by the host machine itself.
Creating a Linux Bridge
Next, we will create a Linux Bridge in the system, which is also the default network model used by Docker.
This step will require the use of the brctl tool, which can be obtained by installing brctl-utils on Ubuntu systems.
Sample Code Here, we will use two brctl commands to process, namely:
- brctl add-br $name
- brctl show
The first command will create a Linux Bridge named $name in the system. The second command will display how many Linux Bridges are currently in the system, along with their relevant information.
In this example, we will create a Linux Bridge named hwchiu0. Finally, we will use the ifconfig command to bring up the Bridge, making it in a runnable state.
$ sudo brctl addbr hwchiu0
$ sudo brctl show
bridge name bridge id STP enabled interfaces
hwchiu0 8000.000000000000 no
$ sudo ifconfig hwchiu0 up
After completing this step, our system will create a new Linux Bridge (hwchiu0) in the host machine’s netns (Network Namespace).
At this point, the system architecture diagram is as follows, which is basically the same as the previous one, with just an additional component.
Creating Veth Pair
Currently, we have:
- Two empty containers, including their own netns
- One Linux Bridge
So the next step is to figure out how to connect them. Here we will use a special network device called veth, which allows us to create a special link in the system. The link has two ports, each with a corresponding network interface name. Packets entering from one end will immediately exit from the other end, which can be imagined as a bidirectional pipeline.
Therefore, in this step, we need to create two bidirectional pipelines, or veth pairs, on the host machine.
Code demonstration In this example, we need to use the “ip” command to create the veth pipeline. There are many variations of this command, but we will demonstrate one usage here:
ip link add dev ${name} type veth
The above command will ask the kernel to create a veth-based link, and one end of the link will be named ${name}, while the other end will be handled by the kernel.
After this command is executed, the system will have two additional virtual network cards, one named ${name} and the other one created by the kernel, usually starting with vethxxxx
$ sudo ip link add dev c1-eth0 type veth
$ sudo ip link add dev c2-eth0 type veth
$ sudo ip link | grep veth
23: veth0@c1-eth0: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
24: c1-eth0@veth0: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
25: veth1@c2-eth0: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
26: c2-eth0@veth1: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
By using the above command, we created two veth-based network interfaces, resulting in four virtual network interfaces in the system, paired as follows:
- veth1 -> c2-eth0
- veth0 -> c1-eth0
Line 23 means veth0
which has a veth-pair named c1-eth0
but Line 24 means c1-eth0
which has veth-pair named veth0
Networking/System View
After creating the network interface, the current network model is as follows:
Left image: Currently, all operations are performed within the host machine, so the four virtual network interfaces are located in the netns of the host machine. The connections between the network interfaces are represented by two different colored lines.
Right image: After the system is created, the host machine has four additional virtual network cards that are unrelated to the containers or Linux, and can be thought of as four orphans.
Move Veth to Container
After creating the veth pairs, the next step is to move one end of the veth into the container, so that we can use the veth’s feature to transmit packets between different netns.
The veth pairs created are:
- veth1 –> c2-eth0
- veth0 –> c1-eth0
Our goal is to:
- Move c1-eth0 virtual interface to c1 container
- Move c2-eth0 virtual interface to c2 container
However, we don’t actually move them into the container, but rather move them into the netns (network namespace) to which the container belongs. So we need to be able to access the netns used by these containers.
Here we use the “ip netns” command to perform the operation. By default, this command reads the data under /var/run/netns to display the relevant netns.
However, by default, Docker avoids /var/run/netns, possibly to prevent people from directly using the system command “ip netns” to manipulate the containers and causing them to crash. Therefore, the netns of the containers created by Docker are all placed under /var/run/docker/netns.
Here we just need to use a soft link to connect these two locations, and then we can use “ip netns” to observe the netns of c1 and c2 containers.
$ sudo ln -s /var/run/docker/netns /var/run/netns
$ sudo ip netns show
792fedcf97d8
1bb2e0141544
The two names here are actually NetworkSettings.SnadboxID of the Docker containers. We can use the following command to observe them.
$ docker inspect c1 | jq '.[0].NetworkSettings.SandboxID'
"1bb2e0141544758fe79387ebf4b7297556fb65efacc7d9ed7e068099744babee"
$ docker inspect c2 | jq '.[0].NetworkSettings.SandboxID'
"792fedcf97d8ae10ec0a29f5aa41813ad00825ff8127fd4d9c25b66a5714d7ca"
Therefore, in the current example, 792fedcf97d8 represents the netns of the c2 container, and 1bb2e0141544 represents the netns of the c1 container.
Next, we will finally get to the point of moving the veth pair (one end of the veth) that we created earlier into the corresponding netns. We will use the command “ip link set” to achieve this, which can move virtual network interfaces to different netns and also rename them.
$ sudo ip link set c1-eth0 netns 1bb2e0141544 name eth0
$ sudo ip link set c2-eth0 netns 792fedcf97d8 name eth0
After executing the above command, we can use the docker command to check if there are any changes.
$ sudo docker exec c2 ifconfig -a
eth0 Link encap:Ethernet HWaddr ea:51:1c:2c:a4:15
BROADCAST MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
UP LOOPBACK RUNNING MTU:65536 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
$ sudo docker exec c1 ifconfig -a
eth0 Link encap:Ethernet HWaddr be:a7:29:1b:e0:13
BROADCAST MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
UP LOOPBACK RUNNING MTU:65536 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
$ ip link | grep veth
23: veth0@if24: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
25: veth1@if26: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
Finally, we can use the ip link
command again to observe that only two virtual network cards remain on the host machine, because two of them have been moved into the containers.
Networking/System View
After the corresponding virtual network interfaces have been moved to their respective containers, there is a slight change in the model.
Left Image: One end of the veth pair has been moved to the network namespace of the respective container and renamed to eth0.
Right Image: The change here is that the veth interface has been moved to the container and renamed to eth0.
Binding veth to Bridge
Next, we want to integrate veth with Linux Bridge to take advantage of the bridging functionality to forward packets.
The concept is simple: we bind all the veth interfaces on the host machine to the Linux Bridge hwchiu0 that we created earlier.
Code Example We will use the brctl addif command to achieve this goal. The command is used as follows:
brctl addif ${bridge_name} ${nic_name}
This command adds the network interface named ${nic_name} to the Linux Bridge ${bridge_name}.
After the interface is added, we also bring up the veth virtual network interface using ifconfig (or alternatively, ip link).
$ sudo brctl addif hwchiu0 veth0
$ sudo brctl addif hwchiu0 veth1
$ sudo ifconfig veth0 up
$ sudo ifconfig veth1 up
$ sudo brctl show
bridge name bridge id STP enabled interfaces
hwchiu0 8000.266248dc8ca1 no veth0
veth1
Networking/System View
This step mainly focuses on the movement of veth inside the host, and binds it to the corresponding Linux Bridge.
Left/Right image: The difference is that both ends of the veth are no longer dangling, but belong under the Linux Bridge.
Setting Container IP
Now we have almost connected the whole network! The final step is to set the IP addresses for our containers’ eth0 interfaces. Currently, neither of the containers has an IP address assigned to their eth0 interfaces, so we will take care of that in this step.
There are many considerations when setting IP addresses, but to avoid any further issues, we will set the IP addresses for both containers in the same subnet, just as we would do when using Docker containers. Specifically, we will set the IP address of c1’s eth0 interface to 10.55.66.2, and the IP address of c2’s eth0 interface to 10.55.66.3. Both interfaces will be in the 10.55.66.0/24 subnet.
Do you remember that we set the — privileged flag when creating our containers?
This is because we will need elevated permissions to modify network interface settings inside the container using various network tools. Without the necessary permissions, you will get an error like the following:
SIOCSIFADDR: Operation not permitted
SIOCSIFFLAGS: Operation not permitted
SIOCSIFNETMASK: Operation not permitted
SIOCSIFFLAGS: Operation not permitted
$ sudo docker exec c1 ifconfig eth0 10.55.66.2 netmask 255.255.255.0 up
$ sudo docker exec c2 ifconfig eth0 10.55.66.3 netmask 255.255.255.0 up
$ sudo docker exec c1 ifconfig
eth0 Link encap:Ethernet HWaddr be:a7:29:1b:e0:13
inet addr:10.55.66.2 Bcast:10.55.66.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:11 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:906 (906.0 B) TX bytes:0 (0.0 B)
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
UP LOOPBACK RUNNING MTU:65536 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
$ sudo docker exec c2 ifconfig
eth0 Link encap:Ethernet HWaddr ea:51:1c:2c:a4:15
inet addr:10.55.66.3 Bcast:10.55.66.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:10 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:796 (796.0 B) TX bytes:0 (0.0 B)
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
UP LOOPBACK RUNNING MTU:65536 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
Ping Test
In this stage, we will use the PING command to test our connection. The packet flow will be:
- c1 container’s ping command
- c1 container’s eth0
- veth0 in host
- Linux Bridge on host
- veth1 on Bridge on host
- c2 container’s eth0
We will directly enter the container to execute the ping 10.55.66.3 command.
$ docker exec -it c1 ping 10.55.66.3 -c5
At this point, you may notice that the network is not working, and the ping command does not respond. To fix this, we need to run the following mysterious command:
$ sudo iptables --policy FORWARD ACCEPT
$ docker exec -it c1 ping 10.55.66.3 -c5
You will find that the network is now working and everything is working as expected. As for what the iptables command did and why it affected the packets, we will discuss the concept of iptables in detail in the next chapter.
Networking/System View
Here is an illustration of the packet flow, which summarizes all the components mentioned in this chapter, including veth, Linux Bridge, container IP, and other information.
Summary
Up to this point, we have learned how to create a container from scratch and turn it into a container using the Bridge model. However, at this point, our two containers can only access each other and cannot access the external network, such as ping 8.8.8.8.
Therefore, in the next article, we will explore this second half and fully understand the entire Docker Bridge network model. Through these steps, we will learn what the system actually does when each container starts up.
Aside: There is one part of the above process that is not easy to handle, called IPAM, which is the issue of IP Allocation Management. How do we assign a unique IP address to each container, and how do we recover these IPs when containers fail? it’s a very critical issue when we’re using the Kubernetes, a distributed container orchestration platfom.
Also, the detailed content of this article is actually almost same as the Bridge CNI step, with the only difference being the IP allocation issue and the IP routing table management that will be discussed later.