Understanding the Cilium Datapath: How Packets Actually Move Between Pods

I wanted to understand exactly what happens to a packet after it leaves a pod. Not the high-level "Cilium uses eBPF" explanation you find in every overview talk. The actual path. Which interfaces does the packet touch? What eBPF programs fire? When does encapsulation happen, and when doesn't it? How do routes get populated, and what breaks when they don't?

So I built it. Four different kind clusters, four different routing configurations, packet captures on every one of them. This post covers intranode forwarding through veth pairs, internode native routing with three different route propagation methods (auto injection, static routes, and live BGP with FRR), VXLAN tunnel encapsulation with full packet dissection, and Geneve as an alternative overlay. Everything runs on Docker Desktop with kind on Apple Silicon. Every command in this post produced real output from my lab.

Click any command below to see the real output from my lab run.

The Linux Primitives That Make This Work

Before tracing any packets, you need to understand two kernel constructs that Cilium builds on top of.

Network namespaces give each pod an isolated copy of the networking stack: its own interfaces, routing table, and firewall rules. The host node also has a network namespace shared by system processes, including the Cilium agent.

Veth pairs are virtual Ethernet devices that come in linked pairs. Whatever enters one end comes out the other. Cilium uses them to connect each pod's namespace to the host namespace. Inside the pod, you see eth0. On the host, you see the paired lxc* interface. Cilium attaches eBPF programs to the host side of every veth pair, so packets are inspected and forwarded the moment they cross the namespace boundary.

Intranode: Two Pods on the Same Node

I started with the simplest case. Two pods, same node, one ping. The cluster runs Cilium with VXLAN tunnel mode (the default), but intranode traffic never touches the tunnel. It stays entirely within the host namespace.

Phase 2: Intranode Connectivity

Verify pods land on the same node

$kubectl get pods -o wideexpand

NAME    READY   STATUS    AGE   IP             NODE
pod-a   1/1     Running   31s   10.244.1.98    kind-worker
pod-b   1/1     Running   31s   10.244.1.200   kind-worker

$kubectl exec pod-a -- ping -c 3 10.244.1.200expand

64 bytes from 10.244.1.200: icmp_seq=1 ttl=63 time=0.085 ms
64 bytes from 10.244.1.200: icmp_seq=2 ttl=63 time=0.062 ms
64 bytes from 10.244.1.200: icmp_seq=3 ttl=63 time=0.179 ms
3 packets transmitted, 3 received, 0% packet loss

Cilium interfaces and eBPF hooks on kind-worker

$docker exec kind-worker ip address show cilium_hostexpand

13: cilium_host@cilium_net: <BROADCAST,MULTICAST,NOARP,UP> mtu 65520
    inet 10.244.1.77/32 scope global cilium_host

$bpftool net show | grep -E 'cilium_host|cilium_net|lxc_health'expand

cilium_net(12)  tcx/ingress  cil_to_host
cilium_host(13) tcx/ingress  cil_to_host
cilium_host(13) tcx/egress   cil_from_host
lxc_health(16)  tcx/ingress  cil_from_container

$docker exec kind-worker ip route | grep 10.244expand

10.244.0.0/24 via 10.244.1.77 dev cilium_host # overlay: routes → cilium_host
10.244.1.0/24 via 10.244.1.77 dev cilium_host
10.244.2.0/24 via 10.244.1.77 dev cilium_host

Every lxc* interface has cil_from_container attached at ingress. "Ingress on the host-side veth" means "packets leaving the pod." The moment a packet exits pod-a, it hits the eBPF program before it even enters the host namespace. Cilium makes its forwarding decision right there, at the earliest possible point in the stack.

The path for intranode traffic is: pod-a eth0 → host lxc*(pod-a) → eBPF cil_from_container → host lxc*(pod-b) → pod-b eth0. No tunnel, no encapsulation, no extra headers. Pure in-kernel forwarding through the host namespace.

Internode: Native Routing

When pods are on different nodes, the packet has to cross the physical (or virtual) network between them. Cilium supports two models for this: native routing and encapsulation. Native routing means the packet leaves the node with its pod IP visible on the wire. The underlay has to know how to forward it.

The book describes three ways nodes can learn about each other's PodCIDRs. I labbed all three.

3A: Auto Route Injection

This is the simplest option. You set autoDirectNodeRoutes: true in the Helm values, and Cilium automatically populates every node's routing table with routes to remote PodCIDRs, using node IPs as next hops. The catch is that all nodes must be on the same L2 network.

I deployed Cilium with multi-pool IPAM (10.10.0.0/16, /27 per node) in native routing mode:

Phase 3A: Auto Route Injection (Native Routing)

CiliumNode /27 pool allocations

$kubectl get ciliumnodes kind-worker -o yaml | yq .spec.ipam.poolsexpand

allocated:
  - cidrs:
      - 10.10.0.0/27
    pool: default

Routes point to node IPs via eth0 (not cilium_host)

$docker exec kind-worker ip route | grep 10.10expand

10.10.0.13 dev lxc3c2ed4711743 proto kernel scope link
10.10.0.32/27 via 172.18.0.3 dev eth0 proto kernel # → kind-worker2
10.10.0.64/27 via 172.18.0.4 dev eth0 proto kernel # → control-plane

No tunnel interfaces in native mode

$docker exec kind-worker ip link show cilium_vxlanexpand

Device "cilium_vxlan" does not exist.

tcpdump: raw pod IPs on the wire, no encapsulation

$tcpdump -n -i eth0 icmp (during cross-node ping)expand

17:12:17 10.10.0.13 > 10.10.0.42: ICMP echo request
17:12:17 10.10.0.42 > 10.10.0.13: ICMP echo reply
17:12:18 10.10.0.13 > 10.10.0.42: ICMP echo request
17:12:18 10.10.0.42 > 10.10.0.13: ICMP echo reply
No VXLAN. No UDP/8472. No outer IP. Native routing.

Compare these routes to the Phase 2 overlay routing table. In overlay mode, remote PodCIDRs point to cilium_host. In native mode, they point to node IPs via eth0. That routing table difference tells you instantly which mode you're in.

3B: Static Routing (Breaking Things on Purpose)

I deployed a fresh cluster with autoDirectNodeRoutes: false to prove what happens when no one tells the nodes about remote PodCIDRs:

Phase 3B: Static Routing (Prove the Failure)

No remote PodCIDR routes without autoDirectNodeRoutes

$kubectl exec pod-worker -- ping -c 2 -W 3 10.10.0.44expand

PING 10.10.0.44 (10.10.0.44) 56(84) bytes of data.

--- 10.10.0.44 ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 1056ms

Add static routes manually, then retest

$docker exec kind-worker ip route add 10.10.0.32/27 via 172.18.0.3expand

(route added)

$kubectl exec pod-worker -- ping -c 3 10.10.0.44expand

64 bytes from 10.10.0.44: icmp_seq=1 ttl=62 time=0.229 ms
64 bytes from 10.10.0.44: icmp_seq=2 ttl=62 time=0.371 ms
64 bytes from 10.10.0.44: icmp_seq=3 ttl=62 time=0.237 ms
3 packets transmitted, 3 received, 0% packet loss

The point of this exercise is to feel why static routing doesn't scale. Every node needs routes to every other node's PodCIDR. Adding or removing a node means updating every routing table in the cluster by hand.

Understanding the Cilium Datapath: How Packets Actually Move Between Pods

On this page

The Linux Primitives That Make This Work

Intranode: Two Pods on the Same Node

Internode: Native Routing

3A: Auto Route Injection

3B: Static Routing (Breaking Things on Purpose)

This post is for paying subscribers only

Designing a RoCEv2 Backend Fabric for a 4,000-GPU B200 Cluster

Subscribe to LevelUp I.T. newsletter and stay updated.

Understanding the Cilium Datapath: How Packets Actually Move Between Pods

On this page

The Linux Primitives That Make This Work

Intranode: Two Pods on the Same Node

Internode: Native Routing

3A: Auto Route Injection

3B: Static Routing (Breaking Things on Purpose)

This post is for paying subscribers only

Designing a RoCEv2 Backend Fabric for a 4,000-GPU B200 Cluster

You might also like

Designing a RoCEv2 Backend Fabric for a 4,000-GPU B200 Cluster

How to Train a Private AI That Knows Everything Your Organisation Knows

Agents in Production Without the Cloud: An AI Agent Stack on Kubernetes

Building a Production-Ready Kubernetes Cluster on Ubuntu 24.04 LTS (with Cilium CNI)

Subscribe to LevelUp I.T. newsletter and stay updated.