1)设置错误导致kubeadm安装k8s失败
提示:ERROR FileContent–proc-sys-net-bridge-bridge-nf-call-iptables
[root@node01 data]# kubeadm join masterIP地址:6443 --token abcdef.0123456789abcdef
> --discovery-token-ca-cert-hash sha256:e7a08a24b68c738cccfcc3ae56b7a433704f3753991c782208b46f20c57cf0c9
[preflight] Running pre-flight checks
[WARNING SystemVerification]: this Docker version is not on the list of validated versions: 20.10.12. Latest validated version: 19.03
error execution phase preflight: [preflight] Some fatal errors occurred:
[ERROR FileContent--proc-sys-net-bridge-bridge-nf-call-iptables]: /proc/sys/net/bridge/bridge-nf-call-iptables contents are not set to 1
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
To see the stack trace of this error execute with --v=5 or higher
解决办法:
echo "1" >/proc/sys/net/bridge/bridge-nf-call-iptables
echo "1" >/proc/sys/net/bridge/bridge-nf-call-ip6tables
2)kubeadm安装k8s1.20.1版本添加node节点时出错
参考这个地址:https://bbs.csdn.net/topics/397412529?page=1
accepts at most 1 arg(s), received 2
To see the stack trace of this error execute with --v=5 or higher
稍等片刻后, 加入成功如下:
W0801 19:27:06.500319 12557 join.go:346] [preflight] WARNING: JoinControlPane.controlPlane settings will be ignored when control-plane flag is not set.
[preflight] Running pre-flight checks
[WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/
[WARNING FileExisting-tc]: tc not found in system path
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
[kubelet-start] Downloading configuration for the kubelet from the "kubelet-config-1.18" ConfigMap in the kube-system namespace
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...
This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.
Run 'kubectl get nodes' on the control-plane to see this node join the cluster.
3)部署Kubernetes遇到的问题与解决方法(初始化等)
参考这个地址:https://blog.csdn.net/clareeee/article/details/121100431
4)子节点服务器上运行kubectl get node却发现报错了, 如下
[root@k8s-node02-17:~]# kubectl get node
The connection to the server localhost:8080 was refused - did you specify the right host or port?
可以发现按安装成功后,日志提示的如下步骤操作即可
# 在各个子节点创建.kube目录
[root@master data]# kubectl get nodes
W1116 20:29:22.881159 20594 loader.go:223] Config not found: /etc/kubernetes/admin.conf
The connection to the server localhost:8080 was refused - did you specify the right host or port?
#Master节点执行:
scp /etc/kubernetes/admin.conf root@node01:/etc/kubernetes/admin.conf
scp /etc/kubernetes/admin.conf root@node01:/etc/kubernetes/admin.conf
#Node节点执行:
echo "export KUBECONFIG=/etc/kubernetes/admin.conf" >> ~/.bash_profile
source ~/.bash_profile
# 最后运行测试, 发现不报错了
[root@k8s-master01-15 ~]# kubectl get node
NAME STATUS ROLES AGE VERSION
k8s-master01-15 NotReady master 20m v1.18.6
k8s-node01-16 NotReady 19m v1.18.6
k8s-node02-17 NotReady 19m v1.18.6
5) kubectl get cs 问题
提示:Get “http://127.0.0.1:10251/healthz”: dial tcp 127.0.0.1:10251: connect: connection refused. 出现这种情况,是/etc/kubernetes/manifests/下的kube-controller-manager.yaml和kube-scheduler.yaml设置的默认端口是0导致的,解决方式是注释掉对应的port即可,操作如下:
[root@k8s-master01-15 manifests]# kubectl get cs
Warning: v1 ComponentStatus is deprecated in v1.19+
NAME STATUS MESSAGE ERROR
scheduler Unhealthy Get "http://127.0.0.1:10251/healthz": dial tcp 127.0.0.1:10251: connect: connection refused
controller-manager Healthy ok
etcd-0 Healthy {"health":"true","reason":""}
[root@k8s-master01-15 /etc/kubernetes/manifests]# vim kube-scheduler.yaml +19
.....
spec:
containers:
- command:
- kube-scheduler
- --authentication-kubeconfig=/etc/kubernetes/scheduler.conf
- --authorization-kubeconfig=/etc/kubernetes/scheduler.conf
- --bind-address=127.0.0.1
- --kubeconfig=/etc/kubernetes/scheduler.conf
- --leader-elect=true
#- --port=0 # 注释这行
[root@k8s-master01-15 /etc/kubernetes/manifests]# vim kube-controller-manager.yaml +26
.....
spec:
containers:
- command:
- kube-controller-manager
- --allocate-node-cidrs=true
- --authentication-kubeconfig=/etc/kubernetes/controller-manager.conf
- --authorization-kubeconfig=/etc/kubernetes/controller-manager.conf
- --bind-address=127.0.0.1
- --client-ca-file=/etc/kubernetes/pki/ca.crt
- --cluster-cidr=10.244.0.0/16
- --cluster-name=kubernetes
- --cluster-signing-cert-file=/etc/kubernetes/pki/ca.crt
- --cluster-signing-key-file=/etc/kubernetes/pki/ca.key
- --controllers=*,bootstrapsigner,tokencleaner
- --kubeconfig=/etc/kubernetes/controller-manager.conf
- --leader-elect=true
#- --port=0 # 注释这行
# 重启kubelet.service 服务
systemctl restart kubelet.service
# 检查cs状态
[root@k8s-master01-15 manifests]# kubectl get cs
Warning: v1 ComponentStatus is deprecated in v1.19+
NAME STATUS MESSAGE ERROR
scheduler Healthy ok
etcd-0 Healthy {"health":"true","reason":""}
controller-manager Healthy ok
6) k8s的coredns组件的问题(部署flannel网络插件)
提示:network: open /run/flannel/subnet.env: no such file or directory
[root@k8s-master01-15 core]# kubectl get pods -n kube-system -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
coredns-78fcd69978-6t76m 0/1 ContainerCreating 0 126m node01
coredns-78fcd69978-nthg8 0/1 ContainerCreating 0 126m node01
.....
[root@k8s-master01-15 core]# kubectl describe pods coredns-78fcd69978-6t76m -n kube-system
.......
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SandboxChanged 19m (x4652 over 104m) kubelet Pod sandbox changed, it will be killed and re-created.
Warning FailedCreatePodSandBox 4m26s (x5461 over 104m) kubelet (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "8e2992e19a969235ff30271e317565b48ffe57d8261dc86f92249005a5eaaec5" network for pod "coredns-78fcd69978-6t76m": networkPlugin cni failed to set up pod "coredns-78fcd69978-6t76m_kube-system" network: open /run/flannel/subnet.env: no such file or directory
解决方案:
查看是否有 /run/flannel/subnet.env 这个文件,master 上是存在的,也有内容:
[root@k8s-master01-15:~]# mkdir -p /run/flannel/
cat >> /run/flannel/subnet.env
7)发现 kube-proxy出现异常,状态:CrashLoopBackOff
kube-proxy是作用于service的,作用主要是负责service的实现,实现了内部从pod到service和外部的从node port向service的访问
[root@master core]# kubectl get pods -n kube-system -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
coredns-78fcd69978-dts5t 0/1 ContainerCreating 0 146m node02
coredns-78fcd69978-v8g7z 0/1 ContainerCreating 0 146m node02
etcd-master 1/1 Running 0 147m 172.23.199.15 master
kube-apiserver-master 1/1 Running 0 147m 172.23.199.15 master
kube-controller-manager-master 1/1 Running 0 147m 172.23.199.15 master
kube-proxy-9nxhp 0/1 CrashLoopBackOff 33 (37s ago) 144m 172.23.199.16 node01
kube-proxy-gqrvl 0/1 CrashLoopBackOff 33 (86s ago) 145m 172.23.199.17 node02
kube-proxy-p825v 0/1 CrashLoopBackOff 33 (2m54s ago) 146m 172.23.199.15 master
kube-scheduler-master 1/1 Running 0 147m 172.23.199.15 master
# 使用kubectl describe XXX 排查pod状态的信息
[root@master core]# kubectl describe pod kube-proxy-9nxhp -n kube-system
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning BackOff 8s (x695 over 150m) kubelet Back-off restarting failed container
解决方案:
在1.19版本之前,kubeadm部署方式启用ipvs模式时,初始化配置文件需要添加以下内容:
---
apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: KubeProxyConfiguration
featureGates:
SupportIPVSProxyMode: true
mode: ipvs
如果部署是1.20以上版本,又是使用kubeadm进行集群初始化时,虽然可以正常部署,但是查看pod情况的时候可以看到kube-proxy无法运行成功,报错部分内容如下:
[root@k8s-master01-15:~]# kubectl get pod -A|grep kube-proxy
kube-system kube-proxy-7vrbv 0/1 CrashLoopBackOff 9 43m
kube-system kube-proxy-ghs7h 0/1 CrashLoopBackOff 9 43m
kube-system kube-proxy-l9twb 0/1 CrashLoopBackOff 1 7s
查看日志信息
[root@k8s-master01-15:~]# kubectl logs kube-proxy-9qbwp -n kube-system
E0216 03:00:11.595910 1 run.go:74] "command failed" err="failed complete: unrecognized feature gate: SupportIPVSProxyMode"
通过报错可以看到kube-proxy无法识别SupportIPVSProxyMode
这个字段,于是访问官方查看最新版本ipvs开启的正确配置,通过https://github.com/kubernetes/kubernetes/blob/master/pkg/proxy/ipvs/README.md#check-ipvs-proxy-rules,可以看到官方说明:
Cluster Created by Kubeadm
If you are using kubeadm with a configuration file, you have to add mode: ipvs in a KubeProxyConfiguration (separated by — that is also passed to kubeadm init).
kubeProxy:
config:
mode: ipvs
由于集群已经初始化成功了,所以现在改kubeadm初始化配置文件没有意义,因为我们需要直接修改kube-proxy的启动配置
通过查看kube-pxory的资源清单可以知道, kube-proxy的配置文件是通过configmap方式挂载到容器中的,因此我们只需要对应修改configmap中的配置内容,就可以将无效字段删除
[root@k8s-master01-15:~]# kubectl -o yaml get pod -n kube-system kube-proxy-24tkb
...... # 其他内容省略
99 volumes:
100 - configMap:
101 defaultMode: 420
102 name: kube-proxy
103 name: kube-proxy
...... # 其他内容省略
[root@k8s-master01-15:~]# kubectl get cm -n kube-system
NAME DATA AGE
coredns 1 19h
extension-apiserver-authentication 6 19h
kube-flannel-cfg 2 109m
kube-proxy 2 19h
kube-root-ca.crt 1 19h
kubeadm-config 1 19h
kubelet-config-1.23 1 19h
在编辑模式中找到以下字段,删除后保存退出
[root@k8s-master01-15:~]# kubectl edit cm kube-proxy -n kube-system
featureGates:
SupportIPVSProxyMode: true
然后将删除所有kube-proxy进行重启,查看pod运行情况
[root@k8s-master01-15:~]# watchpod -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-64897985d-c9t7s 1/1 Running 0 19h
coredns-64897985d-knvxg 1/1 Running 0 19h
etcd-master01 1/1 Running 0 19h
kube-apiserver-master01 1/1 Running 0 19h
kube-controller-manager-master01 1/1 Running 0 19h
kube-flannel-ds-6lbmw 1/1 Running 15 (56m ago) 110m
kube-flannel-ds-97mkh 1/1 Running 15 (56m ago) 110m
kube-flannel-ds-fthvm 1/1 Running 15 (56m ago) 110m
kube-proxy-4jj7b 1/1 Running 0 55m
kube-proxy-ksltf 1/1 Running 0 55m
kube-proxy-w8dcr 1/1 Running 0 55m
kube-scheduler-master01 1/1 Running 0 19h
在服务器上安装ipvsadm,查看ipvs模式是否启用成功
[root@k8s-master01-15:~]# yum install ipvsadm -y
[root@k8s-master01-15:~]# ipvsadm -Ln
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
TCP 10.96.0.1:443 rr
-> 172.23.142.233:6443 Masq 1 3 0
TCP 10.96.0.10:53 rr
-> 10.244.0.2:53 Masq 1 0 0
-> 10.244.0.3:53 Masq 1 0 0
TCP 10.96.0.10:9153 rr
-> 10.244.0.2:9153 Masq 1 0 0
-> 10.244.0.3:9153 Masq 1 0 0
UDP 10.96.0.10:53 rr
-> 10.244.0.2:53 Masq 1 0 0
-> 10.244.0.3:53 Masq 1 0 0
8)解决pod的IP无法ping通的问题
集群安装完成后, 启动一个pod
# 启动pod, 命名为nginx-offi, 里面运行的容器为从官网拉取的Nginx镜像
[root@k8s-master01-15:~]# kubectl run nginx-offi --image=nginx
pod/nginx-offi created
# 查看pod的运行信息, 可以看到状态为 "Running" ,IP为 "10.244.1.7", 运行在了 "k8s-node01-16" 节点上
[root@k8s-master01-15:~]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx-offi 1/1 Running 0 55s 10.244.1.7 k8s-node01-16
但是如果在主节点k8s-master01-15 或 另一个子节点 k8s-node02-17上访问刚才运行的pod, 却发现访问不到, 可以尝试 ping一下该IP地址:10.244.1.7也ping不通, 尽管前面我们已经安装好了flannel.
UP发现: 是iptables 规则的问题, 前面我们在初始化服务器设置的时候清除了iptables的规则, 但主要原因是不是安装了 flannel 还是哪一步的问题, 会导致 iptables 里面又多出了规则
# 查看iptables
(root@k8s-master01-15:~)# iptables -L -n
Chain INPUT (policy ACCEPT)
target prot opt source destination
KUBE-FIREWALL all -- 0.0.0.0/0 0.0.0.0/0
Chain FORWARD (policy ACCEPT)
target prot opt source destination
KUBE-FORWARD all -- 0.0.0.0/0 0.0.0.0/0 /* kubernetes forwarding rules */
DOCKER-USER all -- 0.0.0.0/0 0.0.0.0/0
DOCKER-ISOLATION-STAGE-1 all -- 0.0.0.0/0 0.0.0.0/0
ACCEPT all -- 0.0.0.0/0 0.0.0.0/0 ctstate RELATED,ESTABLISHED
DOCKER all -- 0.0.0.0/0 0.0.0.0/0
ACCEPT all -- 0.0.0.0/0 0.0.0.0/0
ACCEPT all -- 0.0.0.0/0 0.0.0.0/0
Chain OUTPUT (policy ACCEPT)
target prot opt source destination
KUBE-FIREWALL all -- 0.0.0.0/0 0.0.0.0/0
Chain DOCKER (1 references)
target prot opt source destination
Chain DOCKER-ISOLATION-STAGE-1 (1 references)
target prot opt source destination
DOCKER-ISOLATION-STAGE-2 all -- 0.0.0.0/0 0.0.0.0/0
RETURN all -- 0.0.0.0/0 0.0.0.0/0
Chain DOCKER-ISOLATION-STAGE-2 (1 references)
target prot opt source destination
DROP all -- 0.0.0.0/0 0.0.0.0/0
RETURN all -- 0.0.0.0/0 0.0.0.0/0
Chain DOCKER-USER (1 references)
target prot opt source destination
RETURN all -- 0.0.0.0/0 0.0.0.0/0
Chain KUBE-FIREWALL (2 references)
target prot opt source destination
DROP all -- 0.0.0.0/0 0.0.0.0/0 /* kubernetes firewall for dropping marked packets */ mark match 0x8000/0x8000
DROP all -- !127.0.0.0/8 127.0.0.0/8 /* block incoming localnet connections */ ! ctstate RELATED,ESTABLISHED,DNAT
Chain KUBE-KUBELET-CANARY (0 references)
target prot opt source destination
Chain KUBE-FORWARD (1 references)
target prot opt source destination
ACCEPT all -- 0.0.0.0/0 0.0.0.0/0 /* kubernetes forwarding rules */ mark match 0x4000/0x4000
ACCEPT all -- 0.0.0.0/0 0.0.0.0/0 /* kubernetes forwarding conntrack pod source rule */ ctstate RELATED,ESTABLISHED
ACCEPT all -- 0.0.0.0/0 0.0.0.0/0 /* kubernetes forwarding conntrack pod destination rule */ ctstate RELATED,ESTABLISHED
# Warning: iptables-legacy tables present, use iptables-legacy to see them
需要再次清空 iptables 规则
iptables -F && iptables -X && iptables -F -t nat && iptables -X -t nat
再次查看iptables
(root@k8s-master01-15:~)# iptables -L -n
Chain INPUT (policy ACCEPT)
target prot opt source destination
Chain FORWARD (policy ACCEPT)
target prot opt source destination
KUBE-FORWARD all -- 0.0.0.0/0 0.0.0.0/0 /* kubernetes forwarding rules */
Chain OUTPUT (policy ACCEPT)
target prot opt source destination
Chain KUBE-FORWARD (1 references)
target prot opt source destination
ACCEPT all -- 0.0.0.0/0 0.0.0.0/0 /* kubernetes forwarding rules */ mark match 0x4000/0x4000
ACCEPT all -- 0.0.0.0/0 0.0.0.0/0 /* kubernetes forwarding conntrack pod source rule */ ctstate RELATED,ESTABLISHED
ACCEPT all -- 0.0.0.0/0 0.0.0.0/0 /* kubernetes forwarding conntrack pod destination rule */ ctstate RELATED,ESTABLISHED
# Warning: iptables-legacy tables present, use iptables-legacy to see them
再次ping或者访问pod, 即可成功
(root@k8s-master01-15:~)# curl 10.244.1.7
Welcome to nginx!
Welcome to nginx!
If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.
For online documentation and support please refer to
nginx.org.
Commercial support is available at
nginx.com.
Thank you for using nginx.