news 2026/6/15 17:50:38

【技术】一文看懂Kubernetes之Calico 网络实现(二)

作者头像

张小明

前端开发工程师

1.2k 24
文章封面图
【技术】一文看懂Kubernetes之Calico 网络实现(二)

【技术】一文看懂Kubernetes之Calico 网络实现(二)

📌 本系列文章主要探讨云计算领域Kubernetes中CNI Calico组件的架构以及网络实现,本文主要介绍calico的ipip网络模式下的通信实现

一、Calico 网络模式

模式数据包封装是否overlay
BGP
IPIP
VXLAN

二、关于 IPIP

ipip是Linux内核原生支持的一种三层隧道协议,全称为IPv4 in IPv4。其核心原理是在原始IPv4报文的基础上再封装一个IPv4报文头,从而实现报文在不同网络间的透明传输。

三、Calico之IPIP

查看当前calico 运行在哪个网络模式下:

# kubectl -n kube-system get ippool default-ipv4-ippool -o yaml apiVersion: crd.projectcalico.org/v1 kind: IPPool metadata: annotations: projectcalico.org/metadata: '{"creationTimestamp":"2025-12-09T07:33:15Z"}' creationTimestamp: "2025-12-09T07:33:15Z" generation: 1 name: default-ipv4-ippool resourceVersion: "941" uid: bfa1b297-4402-4fa7-bfdf-1e1dc629e2cd spec: allowedUses: - Workload - Tunnel blockSize: 26 cidr: 192.168.0.0/16 ipipMode: Always natOutgoing: true nodeSelector: all() vxlanMode: Never

根据 ipipMode: Always 可以看出,calico 这里安装后默认使用的ipip模式。

3.1. 同节点之间POD通信

同一node下 192.168.79.67 和 192.168.79.68 通信

# kubectl get pods -o wide -A NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES default nginx-deployment-bf744486c-8ppjm 1/1 Running 0 5d17h 192.168.79.68 host-10-16-217-141 <none> <none> default nginx-deployment-bf744486c-cvhhz 1/1 Running 0 5d17h 192.168.79.67 host-10-16-217-141 <none> <none>
a. 查看网卡

查看 192.168.79.67的pod,对应宿主机 veth网卡 8: cali88e3e62ccbf@if3

# ip netns exec cni-d84e16a0-941d-7bca-80af-3e8914638e88 ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: tunl0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN group default qlen 1000 link/ipip 0.0.0.0 brd 0.0.0.0 3: eth0@if8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1430 qdisc noqueue state UP group default qlen 1000 link/ether 1a:39:0f:40:34:09 brd ff:ff:ff:ff:ff:ff link-netnsid 0 inet 192.168.79.67/32 scope global eth0 valid_lft forever preferred_lft forever inet6 fe80::1839:fff:fe40:3409/64 scope link valid_lft forever preferred_lft forever # ip netns exec cni-d84e16a0-941d-7bca-80af-3e8914638e88 ip r default via 169.254.1.1 dev eth0 169.254.1.1 dev eth0 scope link

查看 192.168.79.68的pod,对应宿主机veth网卡 9: califc335b22756@if3

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: tunl0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN group default qlen 1000 link/ipip 0.0.0.0 brd 0.0.0.0 3: eth0@if9: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1430 qdisc noqueue state UP group default qlen 1000 link/ether ea:b2:2d:44:f1:73 brd ff:ff:ff:ff:ff:ff link-netnsid 0 inet 192.168.79.68/32 scope global eth0 valid_lft forever preferred_lft forever inet6 fe80::e8b2:2dff:fe44:f173/64 scope link valid_lft forever preferred_lft forever # ip netns exec cni-6f3c9c59-8654-b2c4-756e-eca53e0db3b4 ip r default via 169.254.1.1 dev eth0 169.254.1.1 dev eth0 scope link
b. 网络分析

由 192.168.79.67 的路由可以看出,pod 网卡出向路由直接送给了 169.254.1.1,这里的169.254.1.1是个假地址,因为pod网卡是通过veth到宿主机的caliXXX上,所以即使网关IP是假的,也能从eth0发出送到宿主机的caliXXX上。

[ Pod netns ] eth0 <====veth====> caliXXXX [ Node netns ]

所以,流量从pod 192.168.79.67 从eth0 出去后进入了宿主机,再经宿主机路由联通另外的本机pod

宿主机上对于本机pod有静态路由

# ip r 192.168.79.67 dev cali88e3e62ccbf scope link 192.168.79.68 dev califc335b22756 scope link

所以真实的网络通信路径是

POD1 ====veth====> caliXXX1 [ Node ] ====node===> caliXXX2 [ Node ] ====veth====> POD2
c. 抓包确认

在本宿主机 host-10-16-217-141 抓包,发现caliXXX网卡收到了pod的包

# tcpdump -i any -nnee icmp tcpdump: data link type LINUX_SLL2 tcpdump: verbose output suppressed, use -v[v]... for full protocol decode listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes 10:58:25.977391 cali88e3e62ccbf In ifindex 8 1a:39:0f:40:34:09 ethertype IPv4 (0x0800), length 104: 192.168.79.67 > 192.168.79.68: ICMP echo request, id 4, seq 1, length 64 10:58:25.977492 califc335b22756 Out ifindex 9 ee:ee:ee:ee:ee:ee ethertype IPv4 (0x0800), length 104: 192.168.79.67 > 192.168.79.68: ICMP echo request, id 4, seq 1, length 64 10:58:25.977515 califc335b22756 In ifindex 9 ea:b2:2d:44:f1:73 ethertype IPv4 (0x0800), length 104: 192.168.79.68 > 192.168.79.67: ICMP echo reply, id 4, seq 1, length 64 10:58:25.977534 cali88e3e62ccbf Out ifindex 8 ee:ee:ee:ee:ee:ee ethertype IPv4 (0x0800), length 104: 192.168.79.68 > 192.168.79.67: ICMP echo reply, id 4, seq 1, length 64

综上,由抓包可以看出,流量和如上分析结果一致

3.2 跨节点之间POD通信

不同一node下 192.168.79.72 和 192.168.232.194 通信

# kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES nginx-deployment-bf744486c-8ppjm 1/1 Running 0 5d17h 192.168.79.68 host-10-16-217-141 <none> <none> nginx-deployment-bf744486c-cvhhz 1/1 Running 0 5d17h 192.168.79.67 host-10-16-217-141 <none> <none> nginx-deployment2-fb46746f5-5w77x 1/1 Running 0 12s 192.168.79.72 host-10-16-217-141 <none> <none> nginx-deployment2-fb46746f5-cddm6 1/1 Running 0 9s 192.168.232.194 host-10-16-217-208 <none> <none> nginx-deployment2-fb46746f5-j2p6k 1/1 Running 0 28s 192.168.232.193 host-10-16-217-208 <none> <none>
a. 查看网卡

查看 192.168.79.72 的pod,对应宿主机 veth网卡 13: cali58644df6687@if3

# ip netns exec cni-338ada74-f1bd-25c7-a223-c0d2c0cbf7e1 ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: tunl0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN group default qlen 1000 link/ipip 0.0.0.0 brd 0.0.0.0 3: eth0@if13: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1430 qdisc noqueue state UP group default qlen 1000 link/ether 5a:49:52:d3:d0:b7 brd ff:ff:ff:ff:ff:ff link-netnsid 0 inet 192.168.79.72/32 scope global eth0 valid_lft forever preferred_lft forever inet6 fe80::5849:52ff:fed3:d0b7/64 scope link valid_lft forever preferred_lft forever # ip netns exec cni-338ada74-f1bd-25c7-a223-c0d2c0cbf7e1 ip r default via 169.254.1.1 dev eth0 169.254.1.1 dev eth0 scope link

查看 192.168.232.194 的pod,对应宿主机 veth网卡 7: caliab6a8d8a743@if3

# ip netns exec cni-41842320-511e-c641-304f-1f58c4fdb0bf ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: tunl0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN group default qlen 1000 link/ipip 0.0.0.0 brd 0.0.0.0 3: eth0@if7: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1430 qdisc noqueue state UP group default qlen 1000 link/ether 4a:30:47:7d:15:c7 brd ff:ff:ff:ff:ff:ff link-netnsid 0 inet 192.168.232.194/32 scope global eth0 valid_lft forever preferred_lft forever inet6 fe80::4830:47ff:fe7d:15c7/64 scope link valid_lft forever preferred_lft forever # ip netns exec cni-41842320-511e-c641-304f-1f58c4fdb0bf ip r default via 169.254.1.1 dev eth0 169.254.1.1 dev eth0 scope link
b. 网络分析

和上文一样,pod 192.168.79.72 从eth0 出去进入宿主机 host-10-16-217-141之后,,再经宿主机 host-10-16-217-141 本机路由联通其他node的pod

宿主机host-10-16-217-141上对于目标192.168.232.194 命中静态路由

# ip r 192.168.232.192/26 via 172.22.3.64 dev tunl0 proto bird onlink

所以,真实的网络路径是:

POD1 ====veth====> caliXXX1 [ Node1 ] ====node1===> tunl0 [ Node1 ] ====ipip====> eth0 [ Node2 ]

包进入目标宿主机host-10-16-217-208之后

eth0 [ Node2 ] ===node2===> caliXXX2 [ Node2 ] ====veth====> POD2
c. 抓包确认

在 目标宿主机 host-10-16-217-208 抓包,发现eth0网卡收到了源宿主机host-10-16-217-141的ipip包

# tcpdump -i any -nnee proto 4 tcpdump: data link type LINUX_SLL2 tcpdump: verbose output suppressed, use -v[v]... for full protocol decode listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes 11:43:22.174218 eth0 In ifindex 2 fa:16:3e:46:0b:f2 ethertype IPv4 (0x0800), length 124: 172.22.1.229 > 172.22.3.64: 192.168.79.72 > 192.168.232.194: ICMP echo request, id 5, seq 259, length 64 11:43:22.174323 eth0 Out ifindex 2 fa:16:3e:cd:14:d4 ethertype IPv4 (0x0800), length 124: 172.22.3.64 > 172.22.1.229: 192.168.232.194 > 192.168.79.72: ICMP echo reply, id 5, seq 259, length 64

经tunl0解封装后,经静态路由送给了caliXXX2网卡

192.168.232.194 dev caliab6a8d8a743 src 172.22.3.64 uid 0
11:39:07.198229 tunl0 In ifindex 3 ethertype IPv4 (0x0800), length 104: 192.168.79.72 > 192.168.232.194: ICMP echo request, id 5, seq 10, length 64 11:39:07.198313 caliab6a8d8a743 Out ifindex 7 ee:ee:ee:ee:ee:ee ethertype IPv4 (0x0800), length 104: 192.168.79.72 > 192.168.232.194: ICMP echo request, id 5, seq 10, length 64 11:39:07.198345 caliab6a8d8a743 In ifindex 7 4a:30:47:7d:15:c7 ethertype IPv4 (0x0800), length 104: 192.168.232.194 > 192.168.79.72: ICMP echo reply, id 5, seq 10, length 64 11:39:07.198364 tunl0 Out ifindex 3 ethertype IPv4 (0x0800), length 104: 192.168.232.194 > 192.168.79.72: ICMP echo reply, id 5, seq 10, length 64

综上,由抓包可以看出,流量和如上分析结果一致

3.3 POD和SVC通信

提到svc,那自然想到了kube-proxy,本文这里kube-proxy也是基于iptables实现的

kube-proxy 在 iptables 模式下,干三件事:

  1. 截获访问 Service IP 的流量
  2. 选择一个后端 Pod
  3. 做 DNAT

那么pod 192.168.79.72 和 比如 svc 172.16.0.10 是如何通信的

# kubectl get svc -o wide -A NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR default kubernetes ClusterIP 172.16.0.1 <none> 443/TCP 20d <none> kube-system kube-dns ClusterIP 172.16.0.10 <none> 53/UDP,53/TCP,9153/TCP 20d k8s-app=kube-dns
a.如何截获访问 Service IP 的流量

通过iptables如下链依次截获流量

PREROUTING --> KUBE-SERVICES --> KUBE-SVC-xxxx --> KUBE-SEP-xxxx

查看 宿主机上的PREROUTING链信息

iptables -t nat -nvL PREROUTING Chain PREROUTING (policy ACCEPT 0 packets, 0 bytes) pkts bytes target prot opt in out source destination 55182 2870K cali-PREROUTING all -- * * 0.0.0.0/0 0.0.0.0/0 /* cali:6gwbT8clXdHdC1b1 */ 55284 2875K KUBE-SERVICES all -- * * 0.0.0.0/0 0.0.0.0/0 /* kubernetes service portals */

POD流量经宿主机出去后,

Chain KUBE-SERVICES (2 references) pkts bytes target prot opt in out source destination 0 0 KUBE-SVC-TCOU7JCQXEZGVUNU udp -- * * 0.0.0.0/0 172.16.0.10 /* kube-system/kube-dns:dns cluster IP */ 0 0 KUBE-SVC-ERIFXISQEP7F7OF4 tcp -- * * 0.0.0.0/0 172.16.0.10 /* kube-system/kube-dns:dns-tcp cluster IP */ 0 0 KUBE-SVC-JD5MR3NA4I4DYORP tcp -- * * 0.0.0.0/0 172.16.0.10 /* kube-system/kube-dns:metrics cluster IP */ 0 0 KUBE-SVC-NPX46M4PTMTKRN6Y tcp -- * * 0.0.0.0/0 172.16.0.1 /* default/kubernetes:https cluster IP */ 65 4547 KUBE-NODEPORTS all -- * * 0.0.0.0/0 0.0.0.0/0 /* kubernetes service nodeports; NOTE: this must be the last rule in this chain */ ADDRTYPE match dst-type LOCAL

KUBE-SERVICES 链中,将去往 172.16.0.10的udp包转给了KUBE-SVC-TCOU7JCQXEZGVUNU 链

Chain KUBE-SVC-TCOU7JCQXEZGVUNU (1 references) pkts bytes target prot opt in out source destination 0 0 KUBE-MARK-MASQ udp -- * * !192.168.0.0/16 172.16.0.10 /* kube-system/kube-dns:dns cluster IP */ 0 0 KUBE-SEP-V3WL5PSHR6KK4LJN all -- * * 0.0.0.0/0 0.0.0.0/0 /* kube-system/kube-dns:dns -> 192.168.34.193:53 */ statistic mode random probability 0.50000000000 0 0 KUBE-SEP-NJ5U6PSIJNX4FJ6P all -- * * 0.0.0.0/0 0.0.0.0/0 /* kube-system/kube-dns:dns -> 192.168.79.66:53 */

最终在 KUBE-SEP-V3WL5PSHR6KK4LJN 链中,一方面对POD将192.168.34.193 的回包做MARK从而能够在POSTROUTING中对其进行SNAT/MARSQUERADE,同时将包的目的地址和IP改成192.168.34.193:53

Chain KUBE-SEP-V3WL5PSHR6KK4LJN (1 references) pkts bytes target prot opt in out source destination 0 0 KUBE-MARK-MASQ all -- * * 192.168.34.193 0.0.0.0/0 /* kube-system/kube-dns:dns */ 0 0 DNAT udp -- * * 0.0.0.0/0 0.0.0.0/0 /* kube-system/kube-dns:dns */ udp to:192.168.34.193:53
c. 抓包确认

因为本质POD访问SVC也是被DNAT给了POD IP,这个就和上面POD和POD的抓包结果一样,就不贴抓包结果了。

四、结尾

calico的网络能力,主要依赖于Linux内核的overlay网络封装能力,不论是ipip抑或是vxlan等,同时借助于iptables实现细致的隔离策略,不论在openstack还是kubernetes,都是借助Linux内核的能力,这也是内核态的通用解决方案了。

参考:

https://cloud.tencent.com/developer/article/2394273

版权声明: 本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若内容造成侵权/违法违规/事实不符,请联系邮箱:809451989@qq.com进行投诉反馈,一经查实,立即删除!
网站建设 2026/6/15 1:17:52

公众号图文变视频:HeyGem赋能微信生态内容升级

HeyGem赋能微信生态&#xff1a;图文到视频的智能跃迁 在微信公众号运营者越来越感受到“不发视频就掉队”的今天&#xff0c;内容形式的升级已不再是选择题&#xff0c;而是生存题。短视频平台的算法偏爱动态内容&#xff0c;用户注意力向视觉化迁移&#xff0c;传统图文即便文…

作者头像 李华
网站建设 2026/6/15 13:57:01

从超时到崩溃,C#网络通信错误全解析,教你构建高可靠客户端

第一章&#xff1a;从超时到崩溃&#xff0c;C#网络通信错误全解析在C#开发中&#xff0c;网络通信是应用程序与外部服务交互的核心机制。然而&#xff0c;由于网络环境的不确定性&#xff0c;开发者常面临连接超时、数据丢包、服务器无响应甚至程序崩溃等问题。理解这些异常的…

作者头像 李华
网站建设 2026/6/15 12:49:23

Unity引擎接入方案:打造交互式数字人应用程序

Unity引擎接入方案&#xff1a;打造交互式数字人应用程序 在虚拟主播、智能客服和沉浸式教学日益普及的今天&#xff0c;用户对“像真人一样交流”的数字人需求愈发强烈。然而&#xff0c;传统方案往往陷入两难&#xff1a;要么依赖昂贵的动作捕捉设备与动画师手工调校&#xf…

作者头像 李华
网站建设 2026/6/15 14:19:19

C#内联数组详解:为什么顶尖团队都在用Ref Struct优化性能

第一章&#xff1a;C#内联数组的基本概念与背景C# 内联数组&#xff08;Inline Arrays&#xff09;是 .NET 7 引入的一项重要语言特性&#xff0c;旨在提升高性能场景下的内存效率和执行速度。它允许开发者在结构体中声明固定大小的数组&#xff0c;并将其直接嵌入到结构体内存…

作者头像 李华
网站建设 2026/6/13 17:18:43

彻底拆解大语言模型:从Tokens到Transformer的黑匣子揭秘,程序员必看!

文章通过解剖方式解析大语言模型内部工作机制。从Token化过程开始&#xff0c;解释Next Token Predict原理&#xff0c;说明Token如何转化为数字表示&#xff0c;深入探讨Transformer架构中的位置编码、自注意力机制、前馈网络和Softmax层等关键组件。文章旨在将大语言模型从&q…

作者头像 李华
网站建设 2026/6/10 20:24:37

管理信息系统(第四版)学什么?亮点与局限帮你理清

管理信息系统作为数字化时代的核心基础设施&#xff0c;其教科书版本更新反映了理论与实践的演进。《管理信息系统&#xff08;第四版&#xff09;》这本教材&#xff0c;在多个高校被广泛采用&#xff0c;它试图搭建起从技术原理到商业应用的桥梁。然而&#xff0c;在当前的商…

作者头像 李华