On this page
I wanted to understand exactly what happens to a packet after it leaves a pod. Not the high-level "Cilium uses eBPF" explanation you find in every overview talk. The actual path. Which interfaces does the packet touch? What eBPF programs fire? When does encapsulation happen, and when doesn't it? How do routes get populated, and what breaks when they don't?
So I built it. Four different kind clusters, four different routing configurations, packet captures on every one of them. This post covers intranode forwarding through veth pairs, internode native routing with three different route propagation methods (auto injection, static routes, and live BGP with FRR), VXLAN tunnel encapsulation with full packet dissection, and Geneve as an alternative overlay. Everything runs on Docker Desktop with kind on Apple Silicon. Every command in this post produced real output from my lab.
Click any command below to see the real output from my lab run.
The Linux Primitives That Make This Work
Before tracing any packets, you need to understand two kernel constructs that Cilium builds on top of.
Network namespaces give each pod an isolated copy of the networking stack: its own interfaces, routing table, and firewall rules. The host node also has a network namespace shared by system processes, including the Cilium agent.
Veth pairs are virtual Ethernet devices that come in linked pairs. Whatever enters one end comes out the other. Cilium uses them to connect each pod's namespace to the host namespace. Inside the pod, you see eth0. On the host, you see the paired lxc* interface. Cilium attaches eBPF programs to the host side of every veth pair, so packets are inspected and forwarded the moment they cross the namespace boundary.
Intranode: Two Pods on the Same Node
I started with the simplest case. Two pods, same node, one ping. The cluster runs Cilium with VXLAN tunnel mode (the default), but intranode traffic never touches the tunnel. It stays entirely within the host namespace.
NAME READY STATUS AGE IP NODE pod-a 1/1 Running 31s 10.244.1.98 kind-worker pod-b 1/1 Running 31s 10.244.1.200 kind-worker
64 bytes from 10.244.1.200: icmp_seq=1 ttl=63 time=0.085 ms 64 bytes from 10.244.1.200: icmp_seq=2 ttl=63 time=0.062 ms 64 bytes from 10.244.1.200: icmp_seq=3 ttl=63 time=0.179 ms 3 packets transmitted, 3 received, 0% packet loss
13: cilium_host@cilium_net: <BROADCAST,MULTICAST,NOARP,UP> mtu 65520 inet 10.244.1.77/32 scope global cilium_host
cilium_net(12) tcx/ingress cil_to_host cilium_host(13) tcx/ingress cil_to_host cilium_host(13) tcx/egress cil_from_host lxc_health(16) tcx/ingress cil_from_container
10.244.0.0/24 via 10.244.1.77 dev cilium_host # overlay: routes → cilium_host 10.244.1.0/24 via 10.244.1.77 dev cilium_host 10.244.2.0/24 via 10.244.1.77 dev cilium_host
Every lxc* interface has cil_from_container attached at ingress. "Ingress on the host-side veth" means "packets leaving the pod." The moment a packet exits pod-a, it hits the eBPF program before it even enters the host namespace. Cilium makes its forwarding decision right there, at the earliest possible point in the stack.
The path for intranode traffic is: pod-a eth0 → host lxc*(pod-a) → eBPF cil_from_container → host lxc*(pod-b) → pod-b eth0. No tunnel, no encapsulation, no extra headers. Pure in-kernel forwarding through the host namespace.
Internode: Native Routing
When pods are on different nodes, the packet has to cross the physical (or virtual) network between them. Cilium supports two models for this: native routing and encapsulation. Native routing means the packet leaves the node with its pod IP visible on the wire. The underlay has to know how to forward it.
The book describes three ways nodes can learn about each other's PodCIDRs. I labbed all three.
3A: Auto Route Injection
This is the simplest option. You set autoDirectNodeRoutes: true in the Helm values, and Cilium automatically populates every node's routing table with routes to remote PodCIDRs, using node IPs as next hops. The catch is that all nodes must be on the same L2 network.
I deployed Cilium with multi-pool IPAM (10.10.0.0/16, /27 per node) in native routing mode:
allocated:
- cidrs:
- 10.10.0.0/27
pool: default10.10.0.13 dev lxc3c2ed4711743 proto kernel scope link 10.10.0.32/27 via 172.18.0.3 dev eth0 proto kernel # → kind-worker2 10.10.0.64/27 via 172.18.0.4 dev eth0 proto kernel # → control-plane
Device "cilium_vxlan" does not exist.17:12:17 10.10.0.13 > 10.10.0.42: ICMP echo request 17:12:17 10.10.0.42 > 10.10.0.13: ICMP echo reply 17:12:18 10.10.0.13 > 10.10.0.42: ICMP echo request 17:12:18 10.10.0.42 > 10.10.0.13: ICMP echo reply No VXLAN. No UDP/8472. No outer IP. Native routing.
Compare these routes to the Phase 2 overlay routing table. In overlay mode, remote PodCIDRs point to cilium_host. In native mode, they point to node IPs via eth0. That routing table difference tells you instantly which mode you're in.
3B: Static Routing (Breaking Things on Purpose)
I deployed a fresh cluster with autoDirectNodeRoutes: false to prove what happens when no one tells the nodes about remote PodCIDRs:
PING 10.10.0.44 (10.10.0.44) 56(84) bytes of data.
--- 10.10.0.44 ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 1056ms(route added)64 bytes from 10.10.0.44: icmp_seq=1 ttl=62 time=0.229 ms 64 bytes from 10.10.0.44: icmp_seq=2 ttl=62 time=0.371 ms 64 bytes from 10.10.0.44: icmp_seq=3 ttl=62 time=0.237 ms 3 packets transmitted, 3 received, 0% packet loss
The point of this exercise is to feel why static routing doesn't scale. Every node needs routes to every other node's PodCIDR. Adding or removing a node means updating every routing table in the cluster by hand.