Coredns in CrashLoopBackOff (kubernetes 1.11)

up vote
2
down vote

favorite

I'm trying to install kubernetes on an Ubuntu 16.04 VM, followed instructions at https://kubernetes.io/docs/setup/independent/create-cluster-kubeadm/, using weave as my pod network add-on.

I'm seeing similar issue as coredns pods have CrashLoopBackOff or Error state, but I didn't see a solution there, and the versions I'm using are different:

kubeadm         1.11.4-00

kubectl         1.11.4-00

kubelet         1.11.4-00

kubernetes-cni  0.6.0-00

Docker version 1.13.1-cs8, build 91ca5f2

weave script 2.5.0

weave 2.5.0

I'm running behind a corporate firewall, so I set my proxy variables, then ran kubeadm init as follows:

# echo $http_proxy

http://135.28.13.11:8080

# echo $https_proxy

http://135.28.13.11:8080

# echo $no_proxy

127.0.0.1,135.21.27.139,135.0.0.0/8,10.96.0.0/12,10.32.0.0/12

# kubeadm init --pod-network-cidr=10.32.0.0/12 

# kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d 'n')" 

# kubectl taint nodes --all node-role.kubernetes.io/master-

Both coredns pods stay in CrashLoopBackOff

# kubectl get pods  --all-namespaces -o wide

NAMESPACE     NAME                                     READY     STATUS             RESTARTS   AGE       IP              NODE             NOMINATED NODE

default       hostnames-674b556c4-2b5h2                1/1       Running            0          5h        10.32.0.6       mtpnjvzonap001   <none>

default       hostnames-674b556c4-4bzdj                1/1       Running            0          5h        10.32.0.5       mtpnjvzonap001   <none>

default       hostnames-674b556c4-64gx5                1/1       Running            0          5h        10.32.0.4       mtpnjvzonap001   <none>

kube-system   coredns-78fcdf6894-s7rvx                 0/1       CrashLoopBackOff   18         1h        10.32.0.7       mtpnjvzonap001   <none>

kube-system   coredns-78fcdf6894-vxwgv                 0/1       CrashLoopBackOff   80         6h        10.32.0.2       mtpnjvzonap001   <none>

kube-system   etcd-mtpnjvzonap001                      1/1       Running            0          6h        135.21.27.139   mtpnjvzonap001   <none>

kube-system   kube-apiserver-mtpnjvzonap001            1/1       Running            0          1h        135.21.27.139   mtpnjvzonap001   <none>

kube-system   kube-controller-manager-mtpnjvzonap001   1/1       Running            0          6h        135.21.27.139   mtpnjvzonap001   <none>

kube-system   kube-proxy-2c4tx                         1/1       Running            0          6h        135.21.27.139   mtpnjvzonap001   <none>

kube-system   kube-scheduler-mtpnjvzonap001            1/1       Running            0          1h        135.21.27.139   mtpnjvzonap001   <none>

kube-system   weave-net-bpx22                          2/2       Running            0          6h        135.21.27.139   mtpnjvzonap001   <none>

coredns pods have this entry in their log

E1114 20:59:13.848196 1 reflector.go:205]
github.com/coredns/coredns/plugin/kubernetes/controller.go:313: Failed
to list *v1.Service: Get
https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0:
dial tcp 10.96.0.1:443: i/o timeout

This suggests to me that coredns cannot access apiserver pod using its cluster IP:

# kubectl describe svc/kubernetes

Name:              kubernetes

Namespace:         default

Labels:            component=apiserver

                   provider=kubernetes

Annotations:       <none>

Selector:          <none>

Type:              ClusterIP

IP:                10.96.0.1

Port:              https  443/TCP

TargetPort:        6443/TCP

Endpoints:         135.21.27.139:6443

Session Affinity:  None

Events:            <none>

I also went through the troubleshooting steps at https://kubernetes.io/docs/tasks/debug-application-cluster/troubleshooting/

I created a busybox pod for testing

I created the hostnames deployment successfully

I exposed the hostnames deployment successfully

From the busybox pod, I accessed the hostnames service by its cluster IP successfully

from the node, I accessed the hostnames service by its cluster IP successfully

So in short, I created the hostnames service which had a cluster IP in 10.96.0.0/12 space (as expected), and it works, but for some reason, pods cannot access the apiserver's cluster IP of 10.96.0.1, though from the node I can access 10.96.0.1:

# wget --no-check-certificate https://10.96.0.1/hello

--2018-11-14 21:44:25--  https://10.96.0.1/hello

Connecting to 10.96.0.1:443... connected.

WARNING: cannot verify 10.96.0.1's certificate, issued by ‘CN=kubernetes’:

  Unable to locally verify the issuer's authority.

HTTP request sent, awaiting response... 403 Forbidden

2018-11-14 21:44:25 ERROR 403: Forbidden.

Some other things I checked, based on advice from others who reported a similar problem:

# sysctl net.ipv4.conf.all.forwarding

net.ipv4.conf.all.forwarding = 1

# sysctl net.bridge.bridge-nf-call-iptables

net.bridge.bridge-nf-call-iptables = 1

# iptables-save | egrep ':INPUT|:OUTPUT|:POSTROUTING|:FORWARD'

:INPUT ACCEPT [0:0]

:OUTPUT ACCEPT [11:692]

:POSTROUTING ACCEPT [11:692]

:INPUT ACCEPT [1697:364811]

:FORWARD ACCEPT [0:0]

:OUTPUT ACCEPT [1652:363693]

# ls -l /usr/sbin/conntrack

-rwxr-xr-x 1 root root 65632 Jan 24  2016 /usr/sbin/conntrack

# systemctl status firewalld

● firewalld.service

   Loaded: not-found (Reason: No such file or directory)

   Active: inactive (dead)

I checked the log for kube-proxy, did not see any errors.
I also tried deleting coredns pods, apiserver pod; they are recreated (as expected), but the problem remains.

Here's a copy of the log from the weave container

# kubectl logs -n kube-system weave-net-bpx22 weave

DEBU: 2018/11/14 15:56:10.909921 [kube-peers] Checking peer "aa:53:be:75:71:f7" against list &{}

Peer not in list; removing persisted data

INFO: 2018/11/14 15:56:11.041807 Command line options: map[name:aa:53:be:75:71:f7 nickname:mtpnjvzonap001 ipalloc-init:consensus=1 ipalloc-range:10.32.0.0/12 db-prefix:/weavedb/weave-net docker-api: expect-npc:true host-root:/host http-addr:127.0.0.1:6784 metrics-addr:0.0.0.0:6782 conn-limit:100 datapath:datapath no-dns:true port:6783]

INFO: 2018/11/14 15:56:11.042230 weave  2.5.0

INFO: 2018/11/14 15:56:11.198348 Bridge type is bridged_fastdp

INFO: 2018/11/14 15:56:11.198372 Communication between peers is unencrypted.

INFO: 2018/11/14 15:56:11.203206 Our name is aa:53:be:75:71:f7(mtpnjvzonap001)

INFO: 2018/11/14 15:56:11.203249 Launch detected - using supplied peer list: [135.21.27.139]

INFO: 2018/11/14 15:56:11.216398 Checking for pre-existing addresses on weave bridge

INFO: 2018/11/14 15:56:11.229313 [allocator aa:53:be:75:71:f7] No valid persisted data

INFO: 2018/11/14 15:56:11.233391 [allocator aa:53:be:75:71:f7] Initialising via deferred consensus

INFO: 2018/11/14 15:56:11.233443 Sniffing traffic on datapath (via ODP)

INFO: 2018/11/14 15:56:11.234120 ->[135.21.27.139:6783] attempting connection

INFO: 2018/11/14 15:56:11.234302 ->[135.21.27.139:49182] connection accepted

INFO: 2018/11/14 15:56:11.234818 ->[135.21.27.139:6783|aa:53:be:75:71:f7(mtpnjvzonap001)]: connection shutting down due to error: cannot connect to ourself

INFO: 2018/11/14 15:56:11.234843 ->[135.21.27.139:49182|aa:53:be:75:71:f7(mtpnjvzonap001)]: connection shutting down due to error: cannot connect to ourself

INFO: 2018/11/14 15:56:11.236010 Listening for HTTP control messages on 127.0.0.1:6784

INFO: 2018/11/14 15:56:11.236424 Listening for metrics requests on 0.0.0.0:6782

INFO: 2018/11/14 15:56:11.990529 [kube-peers] Added myself to peer list &{[{aa:53:be:75:71:f7 mtpnjvzonap001}]}

DEBU: 2018/11/14 15:56:11.995901 [kube-peers] Nodes that have disappeared: map

10.32.0.1

135.21.27.139

DEBU: 2018/11/14 15:56:12.075738 registering for updates for node delete events

INFO: 2018/11/14 15:56:41.279799 Error checking version: Get https://checkpoint-api.weave.works/v1/check/weave-net?arch=amd64&flag_docker-version=none&flag_kernel-version=4.4.0-135-generic&flag_kubernetes-cluster-size=1&flag_kubernetes-cluster-uid=ce66cb23-e825-11e8-abc3-525400314503&flag_kubernetes-version=v1.11.4&os=linux&signature=EJdydeNysrC7LC5xAJAKyDvxXCvkeWUFzepdk3QDfr0%3D&version=2.5.0: dial tcp 74.125.196.121:443: i/o timeout

INFO: 2018/11/14 20:52:47.025412 Error checking version: Get https://checkpoint-api.weave.works/v1/check/weave-net?arch=amd64&flag_docker-version=none&flag_kernel-version=4.4.0-135-generic&flag_kubernetes-cluster-size=1&flag_kubernetes-cluster-uid=ce66cb23-e825-11e8-abc3-525400314503&flag_kubernetes-version=v1.11.4&os=linux&signature=EJdydeNysrC7LC5xAJAKyDvxXCvkeWUFzepdk3QDfr0%3D&version=2.5.0: dial tcp 74.125.196.121:443: i/o timeout

INFO: 2018/11/15 01:46:32.842792 Error checking version: Get https://checkpoint-api.weave.works/v1/check/weave-net?arch=amd64&flag_docker-version=none&flag_kernel-version=4.4.0-135-generic&flag_kubernetes-cluster-size=1&flag_kubernetes-cluster-uid=ce66cb23-e825-11e8-abc3-525400314503&flag_kubernetes-version=v1.11.4&os=linux&signature=EJdydeNysrC7LC5xAJAKyDvxXCvkeWUFzepdk3QDfr0%3D&version=2.5.0: dial tcp 74.125.196.121:443: i/o timeout

INFO: 2018/11/15 09:06:03.624359 Error checking version: Get https://checkpoint-api.weave.works/v1/check/weave-net?arch=amd64&flag_docker-version=none&flag_kernel-version=4.4.0-135-generic&flag_kubernetes-cluster-size=1&flag_kubernetes-cluster-uid=ce66cb23-e825-11e8-abc3-525400314503&flag_kubernetes-version=v1.11.4&os=linux&signature=EJdydeNysrC7LC5xAJAKyDvxXCvkeWUFzepdk3QDfr0%3D&version=2.5.0: dial tcp 172.217.9.147:443: i/o timeout

INFO: 2018/11/15 14:34:02.070893 Error checking version: Get https://checkpoint-api.weave.works/v1/check/weave-net?arch=amd64&flag_docker-version=none&flag_kernel-version=4.4.0-135-generic&flag_kubernetes-cluster-size=1&flag_kubernetes-cluster-uid=ce66cb23-e825-11e8-abc3-525400314503&flag_kubernetes-version=v1.11.4&os=linux&signature=EJdydeNysrC7LC5xAJAKyDvxXCvkeWUFzepdk3QDfr0%3D&version=2.5.0: dial tcp 172.217.9.147:443: i/o timeout

Here are the events for the 2 coredns pods

# kubectl get events -n kube-system --field-selector involvedObject.name=coredns-78fcdf6894-6f9q6

LAST SEEN   FIRST SEEN   COUNT     NAME                                        KIND      SUBOBJECT                  TYPE      REASON      SOURCE                    MESSAGE

41m         20h          245       coredns-78fcdf6894-6f9q6.1568eab25f0acb02   Pod       spec.containers{coredns}   Normal    Killing     kubelet, mtpnjvzonap001   Killing container with id docker://coredns:Container failed liveness probe.. Container will be killed and recreated.

26m         20h          248       coredns-78fcdf6894-6f9q6.1568ea920f72ddd4   Pod       spec.containers{coredns}   Normal    Pulled      kubelet, mtpnjvzonap001   Container image "k8s.gcr.io/coredns:1.1.3" already present on machine

5m          20h          1256      coredns-78fcdf6894-6f9q6.1568eaa1fd9216d2   Pod       spec.containers{coredns}   Warning   Unhealthy   kubelet, mtpnjvzonap001   Liveness probe failed: HTTP probe failed with statuscode: 503

1m          19h          2963      coredns-78fcdf6894-6f9q6.1568eb75f2b1af3e   Pod       spec.containers{coredns}   Warning   BackOff     kubelet, mtpnjvzonap001   Back-off restarting failed container

# kubectl get events -n kube-system --field-selector involvedObject.name=coredns-78fcdf6894-skjwz

LAST SEEN   FIRST SEEN   COUNT     NAME                                        KIND      SUBOBJECT                  TYPE      REASON      SOURCE                    MESSAGE

6m          20h          1259      coredns-78fcdf6894-skjwz.1568eaa181fbeffe   Pod       spec.containers{coredns}   Warning   Unhealthy   kubelet, mtpnjvzonap001   Liveness probe failed: HTTP probe failed with statuscode: 503

1m          19h          2969      coredns-78fcdf6894-skjwz.1568eb7578188f24   Pod       spec.containers{coredns}   Warning   BackOff     kubelet, mtpnjvzonap001   Back-off restarting failed container

#

Any help or further troubleshooting steps are welcome

edited Nov 21 at 15:21

asked Nov 14 at 22:27

jm9816

112

Hi jm9816, welcome to SO! I suspect that "pods cannot access the apiserver's cluster IP of 10.96.0.1, though from the node I can access 10.96.0.1:" is due to CNI failure. You might want to check your kubectl logs weave-net-bpx22 and see if weave failed to initialize. And in all circumstances, you'll want to check the situation on a Node, and not the master. Good luck!
– Matthew L Daniel
Nov 15 at 6:24

@jm9816, issue probably connected with CNI as @Matthew L Daniel mentioned. Are you discovering any suspicious events in weave pod? kubectl logs weave-net-bpx22 -n kube-system -c weave; kubectl logs weave-net-bpx22 -n kube-system -c weave-npc
– mk_sta
Nov 15 at 11:35

@MatthewLDaniel - thanks for the suggestions. I should mention I have a single node cluster. But that shouldn't trigger a problem with coredns, right? About weave: I updated my entry to include the weave container log, if that is of interest. I actually opened an issue with weave, but did not get a resolution. So I decided to try it with flannel. I ran kubeadm reset, rebooted my vm for good measure, then re-ran kubeadm init, ... kubectl apply .. this time, using flannel. Result was the same, coredns in CrashLoopBackOff with same error message.
– jm9816
Nov 15 at 17:07

1

Hi, you can try iptables -P FORWARD ACCEPT to bypass the WeaveNet NPC temporarily.
– Kitt Hsu
Nov 16 at 6:58

@KittHsu thanks, already have that in place # iptables-save | fgrep ':FORWARD' :FORWARD ACCEPT [0:0]
– jm9816
Nov 16 at 19:35

|
show 6 more comments

up vote
2
down vote

favorite

I'm trying to install kubernetes on an Ubuntu 16.04 VM, followed instructions at https://kubernetes.io/docs/setup/independent/create-cluster-kubeadm/, using weave as my pod network add-on.

I'm seeing similar issue as coredns pods have CrashLoopBackOff or Error state, but I didn't see a solution there, and the versions I'm using are different:

kubeadm         1.11.4-00

kubectl         1.11.4-00

kubelet         1.11.4-00

kubernetes-cni  0.6.0-00

Docker version 1.13.1-cs8, build 91ca5f2

weave script 2.5.0

weave 2.5.0

I'm running behind a corporate firewall, so I set my proxy variables, then ran kubeadm init as follows:

# echo $http_proxy

http://135.28.13.11:8080

# echo $https_proxy

http://135.28.13.11:8080

# echo $no_proxy

127.0.0.1,135.21.27.139,135.0.0.0/8,10.96.0.0/12,10.32.0.0/12

# kubeadm init --pod-network-cidr=10.32.0.0/12 

# kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d 'n')" 

# kubectl taint nodes --all node-role.kubernetes.io/master-

Both coredns pods stay in CrashLoopBackOff

# kubectl get pods  --all-namespaces -o wide

NAMESPACE     NAME                                     READY     STATUS             RESTARTS   AGE       IP              NODE             NOMINATED NODE

default       hostnames-674b556c4-2b5h2                1/1       Running            0          5h        10.32.0.6       mtpnjvzonap001   <none>

default       hostnames-674b556c4-4bzdj                1/1       Running            0          5h        10.32.0.5       mtpnjvzonap001   <none>

default       hostnames-674b556c4-64gx5                1/1       Running            0          5h        10.32.0.4       mtpnjvzonap001   <none>

kube-system   coredns-78fcdf6894-s7rvx                 0/1       CrashLoopBackOff   18         1h        10.32.0.7       mtpnjvzonap001   <none>

kube-system   coredns-78fcdf6894-vxwgv                 0/1       CrashLoopBackOff   80         6h        10.32.0.2       mtpnjvzonap001   <none>

kube-system   etcd-mtpnjvzonap001                      1/1       Running            0          6h        135.21.27.139   mtpnjvzonap001   <none>

kube-system   kube-apiserver-mtpnjvzonap001            1/1       Running            0          1h        135.21.27.139   mtpnjvzonap001   <none>

kube-system   kube-controller-manager-mtpnjvzonap001   1/1       Running            0          6h        135.21.27.139   mtpnjvzonap001   <none>

kube-system   kube-proxy-2c4tx                         1/1       Running            0          6h        135.21.27.139   mtpnjvzonap001   <none>

kube-system   kube-scheduler-mtpnjvzonap001            1/1       Running            0          1h        135.21.27.139   mtpnjvzonap001   <none>

kube-system   weave-net-bpx22                          2/2       Running            0          6h        135.21.27.139   mtpnjvzonap001   <none>

coredns pods have this entry in their log

E1114 20:59:13.848196 1 reflector.go:205]
github.com/coredns/coredns/plugin/kubernetes/controller.go:313: Failed
to list *v1.Service: Get
https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0:
dial tcp 10.96.0.1:443: i/o timeout

This suggests to me that coredns cannot access apiserver pod using its cluster IP:

# kubectl describe svc/kubernetes

Name:              kubernetes

Namespace:         default

Labels:            component=apiserver

                   provider=kubernetes

Annotations:       <none>

Selector:          <none>

Type:              ClusterIP

IP:                10.96.0.1

Port:              https  443/TCP

TargetPort:        6443/TCP

Endpoints:         135.21.27.139:6443

Session Affinity:  None

Events:            <none>

I also went through the troubleshooting steps at https://kubernetes.io/docs/tasks/debug-application-cluster/troubleshooting/

I created a busybox pod for testing

I created the hostnames deployment successfully

I exposed the hostnames deployment successfully

From the busybox pod, I accessed the hostnames service by its cluster IP successfully

from the node, I accessed the hostnames service by its cluster IP successfully

# wget --no-check-certificate https://10.96.0.1/hello

--2018-11-14 21:44:25--  https://10.96.0.1/hello

Connecting to 10.96.0.1:443... connected.

WARNING: cannot verify 10.96.0.1's certificate, issued by ‘CN=kubernetes’:

  Unable to locally verify the issuer's authority.

HTTP request sent, awaiting response... 403 Forbidden

2018-11-14 21:44:25 ERROR 403: Forbidden.

Some other things I checked, based on advice from others who reported a similar problem:

# sysctl net.ipv4.conf.all.forwarding

net.ipv4.conf.all.forwarding = 1

# sysctl net.bridge.bridge-nf-call-iptables

net.bridge.bridge-nf-call-iptables = 1

# iptables-save | egrep ':INPUT|:OUTPUT|:POSTROUTING|:FORWARD'

:INPUT ACCEPT [0:0]

:OUTPUT ACCEPT [11:692]

:POSTROUTING ACCEPT [11:692]

:INPUT ACCEPT [1697:364811]

:FORWARD ACCEPT [0:0]

:OUTPUT ACCEPT [1652:363693]

# ls -l /usr/sbin/conntrack

-rwxr-xr-x 1 root root 65632 Jan 24  2016 /usr/sbin/conntrack

# systemctl status firewalld

● firewalld.service

   Loaded: not-found (Reason: No such file or directory)

   Active: inactive (dead)

I checked the log for kube-proxy, did not see any errors.
I also tried deleting coredns pods, apiserver pod; they are recreated (as expected), but the problem remains.

Here's a copy of the log from the weave container

# kubectl logs -n kube-system weave-net-bpx22 weave

DEBU: 2018/11/14 15:56:10.909921 [kube-peers] Checking peer "aa:53:be:75:71:f7" against list &{}

Peer not in list; removing persisted data

INFO: 2018/11/14 15:56:11.041807 Command line options: map[name:aa:53:be:75:71:f7 nickname:mtpnjvzonap001 ipalloc-init:consensus=1 ipalloc-range:10.32.0.0/12 db-prefix:/weavedb/weave-net docker-api: expect-npc:true host-root:/host http-addr:127.0.0.1:6784 metrics-addr:0.0.0.0:6782 conn-limit:100 datapath:datapath no-dns:true port:6783]

INFO: 2018/11/14 15:56:11.042230 weave  2.5.0

INFO: 2018/11/14 15:56:11.198348 Bridge type is bridged_fastdp

INFO: 2018/11/14 15:56:11.198372 Communication between peers is unencrypted.

INFO: 2018/11/14 15:56:11.203206 Our name is aa:53:be:75:71:f7(mtpnjvzonap001)

INFO: 2018/11/14 15:56:11.203249 Launch detected - using supplied peer list: [135.21.27.139]

INFO: 2018/11/14 15:56:11.216398 Checking for pre-existing addresses on weave bridge

INFO: 2018/11/14 15:56:11.229313 [allocator aa:53:be:75:71:f7] No valid persisted data

INFO: 2018/11/14 15:56:11.233391 [allocator aa:53:be:75:71:f7] Initialising via deferred consensus

INFO: 2018/11/14 15:56:11.233443 Sniffing traffic on datapath (via ODP)

INFO: 2018/11/14 15:56:11.234120 ->[135.21.27.139:6783] attempting connection

INFO: 2018/11/14 15:56:11.234302 ->[135.21.27.139:49182] connection accepted

INFO: 2018/11/14 15:56:11.234818 ->[135.21.27.139:6783|aa:53:be:75:71:f7(mtpnjvzonap001)]: connection shutting down due to error: cannot connect to ourself

INFO: 2018/11/14 15:56:11.234843 ->[135.21.27.139:49182|aa:53:be:75:71:f7(mtpnjvzonap001)]: connection shutting down due to error: cannot connect to ourself

INFO: 2018/11/14 15:56:11.236010 Listening for HTTP control messages on 127.0.0.1:6784

INFO: 2018/11/14 15:56:11.236424 Listening for metrics requests on 0.0.0.0:6782

INFO: 2018/11/14 15:56:11.990529 [kube-peers] Added myself to peer list &{[{aa:53:be:75:71:f7 mtpnjvzonap001}]}

DEBU: 2018/11/14 15:56:11.995901 [kube-peers] Nodes that have disappeared: map

10.32.0.1

135.21.27.139

DEBU: 2018/11/14 15:56:12.075738 registering for updates for node delete events

INFO: 2018/11/14 15:56:41.279799 Error checking version: Get https://checkpoint-api.weave.works/v1/check/weave-net?arch=amd64&flag_docker-version=none&flag_kernel-version=4.4.0-135-generic&flag_kubernetes-cluster-size=1&flag_kubernetes-cluster-uid=ce66cb23-e825-11e8-abc3-525400314503&flag_kubernetes-version=v1.11.4&os=linux&signature=EJdydeNysrC7LC5xAJAKyDvxXCvkeWUFzepdk3QDfr0%3D&version=2.5.0: dial tcp 74.125.196.121:443: i/o timeout

INFO: 2018/11/14 20:52:47.025412 Error checking version: Get https://checkpoint-api.weave.works/v1/check/weave-net?arch=amd64&flag_docker-version=none&flag_kernel-version=4.4.0-135-generic&flag_kubernetes-cluster-size=1&flag_kubernetes-cluster-uid=ce66cb23-e825-11e8-abc3-525400314503&flag_kubernetes-version=v1.11.4&os=linux&signature=EJdydeNysrC7LC5xAJAKyDvxXCvkeWUFzepdk3QDfr0%3D&version=2.5.0: dial tcp 74.125.196.121:443: i/o timeout

INFO: 2018/11/15 01:46:32.842792 Error checking version: Get https://checkpoint-api.weave.works/v1/check/weave-net?arch=amd64&flag_docker-version=none&flag_kernel-version=4.4.0-135-generic&flag_kubernetes-cluster-size=1&flag_kubernetes-cluster-uid=ce66cb23-e825-11e8-abc3-525400314503&flag_kubernetes-version=v1.11.4&os=linux&signature=EJdydeNysrC7LC5xAJAKyDvxXCvkeWUFzepdk3QDfr0%3D&version=2.5.0: dial tcp 74.125.196.121:443: i/o timeout

INFO: 2018/11/15 09:06:03.624359 Error checking version: Get https://checkpoint-api.weave.works/v1/check/weave-net?arch=amd64&flag_docker-version=none&flag_kernel-version=4.4.0-135-generic&flag_kubernetes-cluster-size=1&flag_kubernetes-cluster-uid=ce66cb23-e825-11e8-abc3-525400314503&flag_kubernetes-version=v1.11.4&os=linux&signature=EJdydeNysrC7LC5xAJAKyDvxXCvkeWUFzepdk3QDfr0%3D&version=2.5.0: dial tcp 172.217.9.147:443: i/o timeout

INFO: 2018/11/15 14:34:02.070893 Error checking version: Get https://checkpoint-api.weave.works/v1/check/weave-net?arch=amd64&flag_docker-version=none&flag_kernel-version=4.4.0-135-generic&flag_kubernetes-cluster-size=1&flag_kubernetes-cluster-uid=ce66cb23-e825-11e8-abc3-525400314503&flag_kubernetes-version=v1.11.4&os=linux&signature=EJdydeNysrC7LC5xAJAKyDvxXCvkeWUFzepdk3QDfr0%3D&version=2.5.0: dial tcp 172.217.9.147:443: i/o timeout

Here are the events for the 2 coredns pods

# kubectl get events -n kube-system --field-selector involvedObject.name=coredns-78fcdf6894-6f9q6

LAST SEEN   FIRST SEEN   COUNT     NAME                                        KIND      SUBOBJECT                  TYPE      REASON      SOURCE                    MESSAGE

41m         20h          245       coredns-78fcdf6894-6f9q6.1568eab25f0acb02   Pod       spec.containers{coredns}   Normal    Killing     kubelet, mtpnjvzonap001   Killing container with id docker://coredns:Container failed liveness probe.. Container will be killed and recreated.

26m         20h          248       coredns-78fcdf6894-6f9q6.1568ea920f72ddd4   Pod       spec.containers{coredns}   Normal    Pulled      kubelet, mtpnjvzonap001   Container image "k8s.gcr.io/coredns:1.1.3" already present on machine

5m          20h          1256      coredns-78fcdf6894-6f9q6.1568eaa1fd9216d2   Pod       spec.containers{coredns}   Warning   Unhealthy   kubelet, mtpnjvzonap001   Liveness probe failed: HTTP probe failed with statuscode: 503

1m          19h          2963      coredns-78fcdf6894-6f9q6.1568eb75f2b1af3e   Pod       spec.containers{coredns}   Warning   BackOff     kubelet, mtpnjvzonap001   Back-off restarting failed container

# kubectl get events -n kube-system --field-selector involvedObject.name=coredns-78fcdf6894-skjwz

LAST SEEN   FIRST SEEN   COUNT     NAME                                        KIND      SUBOBJECT                  TYPE      REASON      SOURCE                    MESSAGE

6m          20h          1259      coredns-78fcdf6894-skjwz.1568eaa181fbeffe   Pod       spec.containers{coredns}   Warning   Unhealthy   kubelet, mtpnjvzonap001   Liveness probe failed: HTTP probe failed with statuscode: 503

1m          19h          2969      coredns-78fcdf6894-skjwz.1568eb7578188f24   Pod       spec.containers{coredns}   Warning   BackOff     kubelet, mtpnjvzonap001   Back-off restarting failed container

#

Any help or further troubleshooting steps are welcome

edited Nov 21 at 15:21

asked Nov 14 at 22:27

jm9816

112

Hi jm9816, welcome to SO! I suspect that "pods cannot access the apiserver's cluster IP of 10.96.0.1, though from the node I can access 10.96.0.1:" is due to CNI failure. You might want to check your kubectl logs weave-net-bpx22 and see if weave failed to initialize. And in all circumstances, you'll want to check the situation on a Node, and not the master. Good luck!
– Matthew L Daniel
Nov 15 at 6:24

@jm9816, issue probably connected with CNI as @Matthew L Daniel mentioned. Are you discovering any suspicious events in weave pod? kubectl logs weave-net-bpx22 -n kube-system -c weave; kubectl logs weave-net-bpx22 -n kube-system -c weave-npc
– mk_sta
Nov 15 at 11:35

@MatthewLDaniel - thanks for the suggestions. I should mention I have a single node cluster. But that shouldn't trigger a problem with coredns, right? About weave: I updated my entry to include the weave container log, if that is of interest. I actually opened an issue with weave, but did not get a resolution. So I decided to try it with flannel. I ran kubeadm reset, rebooted my vm for good measure, then re-ran kubeadm init, ... kubectl apply .. this time, using flannel. Result was the same, coredns in CrashLoopBackOff with same error message.
– jm9816
Nov 15 at 17:07

1

Hi, you can try iptables -P FORWARD ACCEPT to bypass the WeaveNet NPC temporarily.
– Kitt Hsu
Nov 16 at 6:58

@KittHsu thanks, already have that in place # iptables-save | fgrep ':FORWARD' :FORWARD ACCEPT [0:0]
– jm9816
Nov 16 at 19:35

|
show 6 more comments

up vote
2
down vote

favorite

I'm trying to install kubernetes on an Ubuntu 16.04 VM, followed instructions at https://kubernetes.io/docs/setup/independent/create-cluster-kubeadm/, using weave as my pod network add-on.

I'm seeing similar issue as coredns pods have CrashLoopBackOff or Error state, but I didn't see a solution there, and the versions I'm using are different:

kubeadm         1.11.4-00

kubectl         1.11.4-00

kubelet         1.11.4-00

kubernetes-cni  0.6.0-00

Docker version 1.13.1-cs8, build 91ca5f2

weave script 2.5.0

weave 2.5.0

I'm running behind a corporate firewall, so I set my proxy variables, then ran kubeadm init as follows:

# echo $http_proxy

http://135.28.13.11:8080

# echo $https_proxy

http://135.28.13.11:8080

# echo $no_proxy

127.0.0.1,135.21.27.139,135.0.0.0/8,10.96.0.0/12,10.32.0.0/12

# kubeadm init --pod-network-cidr=10.32.0.0/12 

# kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d 'n')" 

# kubectl taint nodes --all node-role.kubernetes.io/master-

Both coredns pods stay in CrashLoopBackOff

# kubectl get pods  --all-namespaces -o wide

NAMESPACE     NAME                                     READY     STATUS             RESTARTS   AGE       IP              NODE             NOMINATED NODE

default       hostnames-674b556c4-2b5h2                1/1       Running            0          5h        10.32.0.6       mtpnjvzonap001   <none>

default       hostnames-674b556c4-4bzdj                1/1       Running            0          5h        10.32.0.5       mtpnjvzonap001   <none>

default       hostnames-674b556c4-64gx5                1/1       Running            0          5h        10.32.0.4       mtpnjvzonap001   <none>

kube-system   coredns-78fcdf6894-s7rvx                 0/1       CrashLoopBackOff   18         1h        10.32.0.7       mtpnjvzonap001   <none>

kube-system   coredns-78fcdf6894-vxwgv                 0/1       CrashLoopBackOff   80         6h        10.32.0.2       mtpnjvzonap001   <none>

kube-system   etcd-mtpnjvzonap001                      1/1       Running            0          6h        135.21.27.139   mtpnjvzonap001   <none>

kube-system   kube-apiserver-mtpnjvzonap001            1/1       Running            0          1h        135.21.27.139   mtpnjvzonap001   <none>

kube-system   kube-controller-manager-mtpnjvzonap001   1/1       Running            0          6h        135.21.27.139   mtpnjvzonap001   <none>

kube-system   kube-proxy-2c4tx                         1/1       Running            0          6h        135.21.27.139   mtpnjvzonap001   <none>

kube-system   kube-scheduler-mtpnjvzonap001            1/1       Running            0          1h        135.21.27.139   mtpnjvzonap001   <none>

kube-system   weave-net-bpx22                          2/2       Running            0          6h        135.21.27.139   mtpnjvzonap001   <none>

coredns pods have this entry in their log

E1114 20:59:13.848196 1 reflector.go:205]
github.com/coredns/coredns/plugin/kubernetes/controller.go:313: Failed
to list *v1.Service: Get
https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0:
dial tcp 10.96.0.1:443: i/o timeout

This suggests to me that coredns cannot access apiserver pod using its cluster IP:

# kubectl describe svc/kubernetes

Name:              kubernetes

Namespace:         default

Labels:            component=apiserver

                   provider=kubernetes

Annotations:       <none>

Selector:          <none>

Type:              ClusterIP

IP:                10.96.0.1

Port:              https  443/TCP

TargetPort:        6443/TCP

Endpoints:         135.21.27.139:6443

Session Affinity:  None

Events:            <none>

I also went through the troubleshooting steps at https://kubernetes.io/docs/tasks/debug-application-cluster/troubleshooting/

I created a busybox pod for testing

I created the hostnames deployment successfully

I exposed the hostnames deployment successfully

From the busybox pod, I accessed the hostnames service by its cluster IP successfully

from the node, I accessed the hostnames service by its cluster IP successfully

# wget --no-check-certificate https://10.96.0.1/hello

--2018-11-14 21:44:25--  https://10.96.0.1/hello

Connecting to 10.96.0.1:443... connected.

WARNING: cannot verify 10.96.0.1's certificate, issued by ‘CN=kubernetes’:

  Unable to locally verify the issuer's authority.

HTTP request sent, awaiting response... 403 Forbidden

2018-11-14 21:44:25 ERROR 403: Forbidden.

Some other things I checked, based on advice from others who reported a similar problem:

# sysctl net.ipv4.conf.all.forwarding

net.ipv4.conf.all.forwarding = 1

# sysctl net.bridge.bridge-nf-call-iptables

net.bridge.bridge-nf-call-iptables = 1

# iptables-save | egrep ':INPUT|:OUTPUT|:POSTROUTING|:FORWARD'

:INPUT ACCEPT [0:0]

:OUTPUT ACCEPT [11:692]

:POSTROUTING ACCEPT [11:692]

:INPUT ACCEPT [1697:364811]

:FORWARD ACCEPT [0:0]

:OUTPUT ACCEPT [1652:363693]

# ls -l /usr/sbin/conntrack

-rwxr-xr-x 1 root root 65632 Jan 24  2016 /usr/sbin/conntrack

# systemctl status firewalld

● firewalld.service

   Loaded: not-found (Reason: No such file or directory)

   Active: inactive (dead)

I checked the log for kube-proxy, did not see any errors.
I also tried deleting coredns pods, apiserver pod; they are recreated (as expected), but the problem remains.

Here's a copy of the log from the weave container

# kubectl logs -n kube-system weave-net-bpx22 weave

DEBU: 2018/11/14 15:56:10.909921 [kube-peers] Checking peer "aa:53:be:75:71:f7" against list &{}

Peer not in list; removing persisted data

INFO: 2018/11/14 15:56:11.041807 Command line options: map[name:aa:53:be:75:71:f7 nickname:mtpnjvzonap001 ipalloc-init:consensus=1 ipalloc-range:10.32.0.0/12 db-prefix:/weavedb/weave-net docker-api: expect-npc:true host-root:/host http-addr:127.0.0.1:6784 metrics-addr:0.0.0.0:6782 conn-limit:100 datapath:datapath no-dns:true port:6783]

INFO: 2018/11/14 15:56:11.042230 weave  2.5.0

INFO: 2018/11/14 15:56:11.198348 Bridge type is bridged_fastdp

INFO: 2018/11/14 15:56:11.198372 Communication between peers is unencrypted.

INFO: 2018/11/14 15:56:11.203206 Our name is aa:53:be:75:71:f7(mtpnjvzonap001)

INFO: 2018/11/14 15:56:11.203249 Launch detected - using supplied peer list: [135.21.27.139]

INFO: 2018/11/14 15:56:11.216398 Checking for pre-existing addresses on weave bridge

INFO: 2018/11/14 15:56:11.229313 [allocator aa:53:be:75:71:f7] No valid persisted data

INFO: 2018/11/14 15:56:11.233391 [allocator aa:53:be:75:71:f7] Initialising via deferred consensus

INFO: 2018/11/14 15:56:11.233443 Sniffing traffic on datapath (via ODP)

INFO: 2018/11/14 15:56:11.234120 ->[135.21.27.139:6783] attempting connection

INFO: 2018/11/14 15:56:11.234302 ->[135.21.27.139:49182] connection accepted

INFO: 2018/11/14 15:56:11.234818 ->[135.21.27.139:6783|aa:53:be:75:71:f7(mtpnjvzonap001)]: connection shutting down due to error: cannot connect to ourself

INFO: 2018/11/14 15:56:11.234843 ->[135.21.27.139:49182|aa:53:be:75:71:f7(mtpnjvzonap001)]: connection shutting down due to error: cannot connect to ourself

INFO: 2018/11/14 15:56:11.236010 Listening for HTTP control messages on 127.0.0.1:6784

INFO: 2018/11/14 15:56:11.236424 Listening for metrics requests on 0.0.0.0:6782

INFO: 2018/11/14 15:56:11.990529 [kube-peers] Added myself to peer list &{[{aa:53:be:75:71:f7 mtpnjvzonap001}]}

DEBU: 2018/11/14 15:56:11.995901 [kube-peers] Nodes that have disappeared: map

10.32.0.1

135.21.27.139

DEBU: 2018/11/14 15:56:12.075738 registering for updates for node delete events

INFO: 2018/11/14 15:56:41.279799 Error checking version: Get https://checkpoint-api.weave.works/v1/check/weave-net?arch=amd64&flag_docker-version=none&flag_kernel-version=4.4.0-135-generic&flag_kubernetes-cluster-size=1&flag_kubernetes-cluster-uid=ce66cb23-e825-11e8-abc3-525400314503&flag_kubernetes-version=v1.11.4&os=linux&signature=EJdydeNysrC7LC5xAJAKyDvxXCvkeWUFzepdk3QDfr0%3D&version=2.5.0: dial tcp 74.125.196.121:443: i/o timeout

INFO: 2018/11/14 20:52:47.025412 Error checking version: Get https://checkpoint-api.weave.works/v1/check/weave-net?arch=amd64&flag_docker-version=none&flag_kernel-version=4.4.0-135-generic&flag_kubernetes-cluster-size=1&flag_kubernetes-cluster-uid=ce66cb23-e825-11e8-abc3-525400314503&flag_kubernetes-version=v1.11.4&os=linux&signature=EJdydeNysrC7LC5xAJAKyDvxXCvkeWUFzepdk3QDfr0%3D&version=2.5.0: dial tcp 74.125.196.121:443: i/o timeout

INFO: 2018/11/15 01:46:32.842792 Error checking version: Get https://checkpoint-api.weave.works/v1/check/weave-net?arch=amd64&flag_docker-version=none&flag_kernel-version=4.4.0-135-generic&flag_kubernetes-cluster-size=1&flag_kubernetes-cluster-uid=ce66cb23-e825-11e8-abc3-525400314503&flag_kubernetes-version=v1.11.4&os=linux&signature=EJdydeNysrC7LC5xAJAKyDvxXCvkeWUFzepdk3QDfr0%3D&version=2.5.0: dial tcp 74.125.196.121:443: i/o timeout

INFO: 2018/11/15 09:06:03.624359 Error checking version: Get https://checkpoint-api.weave.works/v1/check/weave-net?arch=amd64&flag_docker-version=none&flag_kernel-version=4.4.0-135-generic&flag_kubernetes-cluster-size=1&flag_kubernetes-cluster-uid=ce66cb23-e825-11e8-abc3-525400314503&flag_kubernetes-version=v1.11.4&os=linux&signature=EJdydeNysrC7LC5xAJAKyDvxXCvkeWUFzepdk3QDfr0%3D&version=2.5.0: dial tcp 172.217.9.147:443: i/o timeout

INFO: 2018/11/15 14:34:02.070893 Error checking version: Get https://checkpoint-api.weave.works/v1/check/weave-net?arch=amd64&flag_docker-version=none&flag_kernel-version=4.4.0-135-generic&flag_kubernetes-cluster-size=1&flag_kubernetes-cluster-uid=ce66cb23-e825-11e8-abc3-525400314503&flag_kubernetes-version=v1.11.4&os=linux&signature=EJdydeNysrC7LC5xAJAKyDvxXCvkeWUFzepdk3QDfr0%3D&version=2.5.0: dial tcp 172.217.9.147:443: i/o timeout

Here are the events for the 2 coredns pods

# kubectl get events -n kube-system --field-selector involvedObject.name=coredns-78fcdf6894-6f9q6

LAST SEEN   FIRST SEEN   COUNT     NAME                                        KIND      SUBOBJECT                  TYPE      REASON      SOURCE                    MESSAGE

41m         20h          245       coredns-78fcdf6894-6f9q6.1568eab25f0acb02   Pod       spec.containers{coredns}   Normal    Killing     kubelet, mtpnjvzonap001   Killing container with id docker://coredns:Container failed liveness probe.. Container will be killed and recreated.

26m         20h          248       coredns-78fcdf6894-6f9q6.1568ea920f72ddd4   Pod       spec.containers{coredns}   Normal    Pulled      kubelet, mtpnjvzonap001   Container image "k8s.gcr.io/coredns:1.1.3" already present on machine

5m          20h          1256      coredns-78fcdf6894-6f9q6.1568eaa1fd9216d2   Pod       spec.containers{coredns}   Warning   Unhealthy   kubelet, mtpnjvzonap001   Liveness probe failed: HTTP probe failed with statuscode: 503

1m          19h          2963      coredns-78fcdf6894-6f9q6.1568eb75f2b1af3e   Pod       spec.containers{coredns}   Warning   BackOff     kubelet, mtpnjvzonap001   Back-off restarting failed container

# kubectl get events -n kube-system --field-selector involvedObject.name=coredns-78fcdf6894-skjwz

LAST SEEN   FIRST SEEN   COUNT     NAME                                        KIND      SUBOBJECT                  TYPE      REASON      SOURCE                    MESSAGE

6m          20h          1259      coredns-78fcdf6894-skjwz.1568eaa181fbeffe   Pod       spec.containers{coredns}   Warning   Unhealthy   kubelet, mtpnjvzonap001   Liveness probe failed: HTTP probe failed with statuscode: 503

1m          19h          2969      coredns-78fcdf6894-skjwz.1568eb7578188f24   Pod       spec.containers{coredns}   Warning   BackOff     kubelet, mtpnjvzonap001   Back-off restarting failed container

#

Any help or further troubleshooting steps are welcome

edited Nov 21 at 15:21

asked Nov 14 at 22:27

jm9816

112

I'm trying to install kubernetes on an Ubuntu 16.04 VM, followed instructions at https://kubernetes.io/docs/setup/independent/create-cluster-kubeadm/, using weave as my pod network add-on.

I'm seeing similar issue as coredns pods have CrashLoopBackOff or Error state, but I didn't see a solution there, and the versions I'm using are different:

kubeadm         1.11.4-00

kubectl         1.11.4-00

kubelet         1.11.4-00

kubernetes-cni  0.6.0-00

Docker version 1.13.1-cs8, build 91ca5f2

weave script 2.5.0

weave 2.5.0

I'm running behind a corporate firewall, so I set my proxy variables, then ran kubeadm init as follows:

# echo $http_proxy

http://135.28.13.11:8080

# echo $https_proxy

http://135.28.13.11:8080

# echo $no_proxy

127.0.0.1,135.21.27.139,135.0.0.0/8,10.96.0.0/12,10.32.0.0/12

# kubeadm init --pod-network-cidr=10.32.0.0/12 

# kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d 'n')" 

# kubectl taint nodes --all node-role.kubernetes.io/master-

Both coredns pods stay in CrashLoopBackOff

# kubectl get pods  --all-namespaces -o wide

NAMESPACE     NAME                                     READY     STATUS             RESTARTS   AGE       IP              NODE             NOMINATED NODE

default       hostnames-674b556c4-2b5h2                1/1       Running            0          5h        10.32.0.6       mtpnjvzonap001   <none>

default       hostnames-674b556c4-4bzdj                1/1       Running            0          5h        10.32.0.5       mtpnjvzonap001   <none>

default       hostnames-674b556c4-64gx5                1/1       Running            0          5h        10.32.0.4       mtpnjvzonap001   <none>

kube-system   coredns-78fcdf6894-s7rvx                 0/1       CrashLoopBackOff   18         1h        10.32.0.7       mtpnjvzonap001   <none>

kube-system   coredns-78fcdf6894-vxwgv                 0/1       CrashLoopBackOff   80         6h        10.32.0.2       mtpnjvzonap001   <none>

kube-system   etcd-mtpnjvzonap001                      1/1       Running            0          6h        135.21.27.139   mtpnjvzonap001   <none>

kube-system   kube-apiserver-mtpnjvzonap001            1/1       Running            0          1h        135.21.27.139   mtpnjvzonap001   <none>

kube-system   kube-controller-manager-mtpnjvzonap001   1/1       Running            0          6h        135.21.27.139   mtpnjvzonap001   <none>

kube-system   kube-proxy-2c4tx                         1/1       Running            0          6h        135.21.27.139   mtpnjvzonap001   <none>

kube-system   kube-scheduler-mtpnjvzonap001            1/1       Running            0          1h        135.21.27.139   mtpnjvzonap001   <none>

kube-system   weave-net-bpx22                          2/2       Running            0          6h        135.21.27.139   mtpnjvzonap001   <none>

coredns pods have this entry in their log

E1114 20:59:13.848196 1 reflector.go:205]
github.com/coredns/coredns/plugin/kubernetes/controller.go:313: Failed
to list *v1.Service: Get
https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0:
dial tcp 10.96.0.1:443: i/o timeout

This suggests to me that coredns cannot access apiserver pod using its cluster IP:

# kubectl describe svc/kubernetes

Name:              kubernetes

Namespace:         default

Labels:            component=apiserver

                   provider=kubernetes

Annotations:       <none>

Selector:          <none>

Type:              ClusterIP

IP:                10.96.0.1

Port:              https  443/TCP

TargetPort:        6443/TCP

Endpoints:         135.21.27.139:6443

Session Affinity:  None

Events:            <none>

I also went through the troubleshooting steps at https://kubernetes.io/docs/tasks/debug-application-cluster/troubleshooting/

I created a busybox pod for testing

I created the hostnames deployment successfully

I exposed the hostnames deployment successfully

From the busybox pod, I accessed the hostnames service by its cluster IP successfully

from the node, I accessed the hostnames service by its cluster IP successfully

# wget --no-check-certificate https://10.96.0.1/hello

--2018-11-14 21:44:25--  https://10.96.0.1/hello

Connecting to 10.96.0.1:443... connected.

WARNING: cannot verify 10.96.0.1's certificate, issued by ‘CN=kubernetes’:

  Unable to locally verify the issuer's authority.

HTTP request sent, awaiting response... 403 Forbidden

2018-11-14 21:44:25 ERROR 403: Forbidden.

Some other things I checked, based on advice from others who reported a similar problem:

# sysctl net.ipv4.conf.all.forwarding

net.ipv4.conf.all.forwarding = 1

# sysctl net.bridge.bridge-nf-call-iptables

net.bridge.bridge-nf-call-iptables = 1

# iptables-save | egrep ':INPUT|:OUTPUT|:POSTROUTING|:FORWARD'

:INPUT ACCEPT [0:0]

:OUTPUT ACCEPT [11:692]

:POSTROUTING ACCEPT [11:692]

:INPUT ACCEPT [1697:364811]

:FORWARD ACCEPT [0:0]

:OUTPUT ACCEPT [1652:363693]

# ls -l /usr/sbin/conntrack

-rwxr-xr-x 1 root root 65632 Jan 24  2016 /usr/sbin/conntrack

# systemctl status firewalld

● firewalld.service

   Loaded: not-found (Reason: No such file or directory)

   Active: inactive (dead)

I checked the log for kube-proxy, did not see any errors.
I also tried deleting coredns pods, apiserver pod; they are recreated (as expected), but the problem remains.

Here's a copy of the log from the weave container

# kubectl logs -n kube-system weave-net-bpx22 weave

DEBU: 2018/11/14 15:56:10.909921 [kube-peers] Checking peer "aa:53:be:75:71:f7" against list &{}

Peer not in list; removing persisted data

INFO: 2018/11/14 15:56:11.041807 Command line options: map[name:aa:53:be:75:71:f7 nickname:mtpnjvzonap001 ipalloc-init:consensus=1 ipalloc-range:10.32.0.0/12 db-prefix:/weavedb/weave-net docker-api: expect-npc:true host-root:/host http-addr:127.0.0.1:6784 metrics-addr:0.0.0.0:6782 conn-limit:100 datapath:datapath no-dns:true port:6783]

INFO: 2018/11/14 15:56:11.042230 weave  2.5.0

INFO: 2018/11/14 15:56:11.198348 Bridge type is bridged_fastdp

INFO: 2018/11/14 15:56:11.198372 Communication between peers is unencrypted.

INFO: 2018/11/14 15:56:11.203206 Our name is aa:53:be:75:71:f7(mtpnjvzonap001)

INFO: 2018/11/14 15:56:11.203249 Launch detected - using supplied peer list: [135.21.27.139]

INFO: 2018/11/14 15:56:11.216398 Checking for pre-existing addresses on weave bridge

INFO: 2018/11/14 15:56:11.229313 [allocator aa:53:be:75:71:f7] No valid persisted data

INFO: 2018/11/14 15:56:11.233391 [allocator aa:53:be:75:71:f7] Initialising via deferred consensus

INFO: 2018/11/14 15:56:11.233443 Sniffing traffic on datapath (via ODP)

INFO: 2018/11/14 15:56:11.234120 ->[135.21.27.139:6783] attempting connection

INFO: 2018/11/14 15:56:11.234302 ->[135.21.27.139:49182] connection accepted

INFO: 2018/11/14 15:56:11.234818 ->[135.21.27.139:6783|aa:53:be:75:71:f7(mtpnjvzonap001)]: connection shutting down due to error: cannot connect to ourself

INFO: 2018/11/14 15:56:11.234843 ->[135.21.27.139:49182|aa:53:be:75:71:f7(mtpnjvzonap001)]: connection shutting down due to error: cannot connect to ourself

INFO: 2018/11/14 15:56:11.236010 Listening for HTTP control messages on 127.0.0.1:6784

INFO: 2018/11/14 15:56:11.236424 Listening for metrics requests on 0.0.0.0:6782

INFO: 2018/11/14 15:56:11.990529 [kube-peers] Added myself to peer list &{[{aa:53:be:75:71:f7 mtpnjvzonap001}]}

DEBU: 2018/11/14 15:56:11.995901 [kube-peers] Nodes that have disappeared: map

10.32.0.1

135.21.27.139

DEBU: 2018/11/14 15:56:12.075738 registering for updates for node delete events

INFO: 2018/11/14 15:56:41.279799 Error checking version: Get https://checkpoint-api.weave.works/v1/check/weave-net?arch=amd64&flag_docker-version=none&flag_kernel-version=4.4.0-135-generic&flag_kubernetes-cluster-size=1&flag_kubernetes-cluster-uid=ce66cb23-e825-11e8-abc3-525400314503&flag_kubernetes-version=v1.11.4&os=linux&signature=EJdydeNysrC7LC5xAJAKyDvxXCvkeWUFzepdk3QDfr0%3D&version=2.5.0: dial tcp 74.125.196.121:443: i/o timeout

INFO: 2018/11/14 20:52:47.025412 Error checking version: Get https://checkpoint-api.weave.works/v1/check/weave-net?arch=amd64&flag_docker-version=none&flag_kernel-version=4.4.0-135-generic&flag_kubernetes-cluster-size=1&flag_kubernetes-cluster-uid=ce66cb23-e825-11e8-abc3-525400314503&flag_kubernetes-version=v1.11.4&os=linux&signature=EJdydeNysrC7LC5xAJAKyDvxXCvkeWUFzepdk3QDfr0%3D&version=2.5.0: dial tcp 74.125.196.121:443: i/o timeout

INFO: 2018/11/15 01:46:32.842792 Error checking version: Get https://checkpoint-api.weave.works/v1/check/weave-net?arch=amd64&flag_docker-version=none&flag_kernel-version=4.4.0-135-generic&flag_kubernetes-cluster-size=1&flag_kubernetes-cluster-uid=ce66cb23-e825-11e8-abc3-525400314503&flag_kubernetes-version=v1.11.4&os=linux&signature=EJdydeNysrC7LC5xAJAKyDvxXCvkeWUFzepdk3QDfr0%3D&version=2.5.0: dial tcp 74.125.196.121:443: i/o timeout

INFO: 2018/11/15 09:06:03.624359 Error checking version: Get https://checkpoint-api.weave.works/v1/check/weave-net?arch=amd64&flag_docker-version=none&flag_kernel-version=4.4.0-135-generic&flag_kubernetes-cluster-size=1&flag_kubernetes-cluster-uid=ce66cb23-e825-11e8-abc3-525400314503&flag_kubernetes-version=v1.11.4&os=linux&signature=EJdydeNysrC7LC5xAJAKyDvxXCvkeWUFzepdk3QDfr0%3D&version=2.5.0: dial tcp 172.217.9.147:443: i/o timeout

INFO: 2018/11/15 14:34:02.070893 Error checking version: Get https://checkpoint-api.weave.works/v1/check/weave-net?arch=amd64&flag_docker-version=none&flag_kernel-version=4.4.0-135-generic&flag_kubernetes-cluster-size=1&flag_kubernetes-cluster-uid=ce66cb23-e825-11e8-abc3-525400314503&flag_kubernetes-version=v1.11.4&os=linux&signature=EJdydeNysrC7LC5xAJAKyDvxXCvkeWUFzepdk3QDfr0%3D&version=2.5.0: dial tcp 172.217.9.147:443: i/o timeout

Here are the events for the 2 coredns pods

# kubectl get events -n kube-system --field-selector involvedObject.name=coredns-78fcdf6894-6f9q6

LAST SEEN   FIRST SEEN   COUNT     NAME                                        KIND      SUBOBJECT                  TYPE      REASON      SOURCE                    MESSAGE

41m         20h          245       coredns-78fcdf6894-6f9q6.1568eab25f0acb02   Pod       spec.containers{coredns}   Normal    Killing     kubelet, mtpnjvzonap001   Killing container with id docker://coredns:Container failed liveness probe.. Container will be killed and recreated.

26m         20h          248       coredns-78fcdf6894-6f9q6.1568ea920f72ddd4   Pod       spec.containers{coredns}   Normal    Pulled      kubelet, mtpnjvzonap001   Container image "k8s.gcr.io/coredns:1.1.3" already present on machine

5m          20h          1256      coredns-78fcdf6894-6f9q6.1568eaa1fd9216d2   Pod       spec.containers{coredns}   Warning   Unhealthy   kubelet, mtpnjvzonap001   Liveness probe failed: HTTP probe failed with statuscode: 503

1m          19h          2963      coredns-78fcdf6894-6f9q6.1568eb75f2b1af3e   Pod       spec.containers{coredns}   Warning   BackOff     kubelet, mtpnjvzonap001   Back-off restarting failed container

# kubectl get events -n kube-system --field-selector involvedObject.name=coredns-78fcdf6894-skjwz

LAST SEEN   FIRST SEEN   COUNT     NAME                                        KIND      SUBOBJECT                  TYPE      REASON      SOURCE                    MESSAGE

6m          20h          1259      coredns-78fcdf6894-skjwz.1568eaa181fbeffe   Pod       spec.containers{coredns}   Warning   Unhealthy   kubelet, mtpnjvzonap001   Liveness probe failed: HTTP probe failed with statuscode: 503

1m          19h          2969      coredns-78fcdf6894-skjwz.1568eb7578188f24   Pod       spec.containers{coredns}   Warning   BackOff     kubelet, mtpnjvzonap001   Back-off restarting failed container

#

Any help or further troubleshooting steps are welcome

kubernetes kubeadm

edited Nov 21 at 15:21

asked Nov 14 at 22:27

jm9816

112

edited Nov 21 at 15:21

asked Nov 14 at 22:27

jm9816

112

edited Nov 21 at 15:21

asked Nov 14 at 22:27

jm9816

112

asked Nov 14 at 22:27

jm9816

112

asked Nov 14 at 22:27

jm9816

112

Hi jm9816, welcome to SO! I suspect that "pods cannot access the apiserver's cluster IP of 10.96.0.1, though from the node I can access 10.96.0.1:" is due to CNI failure. You might want to check your kubectl logs weave-net-bpx22 and see if weave failed to initialize. And in all circumstances, you'll want to check the situation on a Node, and not the master. Good luck!
– Matthew L Daniel
Nov 15 at 6:24

@jm9816, issue probably connected with CNI as @Matthew L Daniel mentioned. Are you discovering any suspicious events in weave pod? kubectl logs weave-net-bpx22 -n kube-system -c weave; kubectl logs weave-net-bpx22 -n kube-system -c weave-npc
– mk_sta
Nov 15 at 11:35

@MatthewLDaniel - thanks for the suggestions. I should mention I have a single node cluster. But that shouldn't trigger a problem with coredns, right? About weave: I updated my entry to include the weave container log, if that is of interest. I actually opened an issue with weave, but did not get a resolution. So I decided to try it with flannel. I ran kubeadm reset, rebooted my vm for good measure, then re-ran kubeadm init, ... kubectl apply .. this time, using flannel. Result was the same, coredns in CrashLoopBackOff with same error message.
– jm9816
Nov 15 at 17:07

1

Hi, you can try iptables -P FORWARD ACCEPT to bypass the WeaveNet NPC temporarily.
– Kitt Hsu
Nov 16 at 6:58

@KittHsu thanks, already have that in place # iptables-save | fgrep ':FORWARD' :FORWARD ACCEPT [0:0]
– jm9816
Nov 16 at 19:35

|
show 6 more comments

Hi jm9816, welcome to SO! I suspect that "pods cannot access the apiserver's cluster IP of 10.96.0.1, though from the node I can access 10.96.0.1:" is due to CNI failure. You might want to check your kubectl logs weave-net-bpx22 and see if weave failed to initialize. And in all circumstances, you'll want to check the situation on a Node, and not the master. Good luck!
– Matthew L Daniel
Nov 15 at 6:24

@jm9816, issue probably connected with CNI as @Matthew L Daniel mentioned. Are you discovering any suspicious events in weave pod? kubectl logs weave-net-bpx22 -n kube-system -c weave; kubectl logs weave-net-bpx22 -n kube-system -c weave-npc
– mk_sta
Nov 15 at 11:35

@MatthewLDaniel - thanks for the suggestions. I should mention I have a single node cluster. But that shouldn't trigger a problem with coredns, right? About weave: I updated my entry to include the weave container log, if that is of interest. I actually opened an issue with weave, but did not get a resolution. So I decided to try it with flannel. I ran kubeadm reset, rebooted my vm for good measure, then re-ran kubeadm init, ... kubectl apply .. this time, using flannel. Result was the same, coredns in CrashLoopBackOff with same error message.
– jm9816
Nov 15 at 17:07

1

Hi, you can try iptables -P FORWARD ACCEPT to bypass the WeaveNet NPC temporarily.
– Kitt Hsu
Nov 16 at 6:58

@KittHsu thanks, already have that in place # iptables-save | fgrep ':FORWARD' :FORWARD ACCEPT [0:0]
– jm9816
Nov 16 at 19:35

Hi jm9816, welcome to SO! I suspect that "pods cannot access the apiserver's cluster IP of 10.96.0.1, though from the node I can access 10.96.0.1:" is due to CNI failure. You might want to check your kubectl logs weave-net-bpx22 and see if weave failed to initialize. And in all circumstances, you'll want to check the situation on a Node, and not the master. Good luck!
– Matthew L Daniel
Nov 15 at 6:24

@jm9816, issue probably connected with CNI as @Matthew L Daniel mentioned. Are you discovering any suspicious events in weave pod? kubectl logs weave-net-bpx22 -n kube-system -c weave; kubectl logs weave-net-bpx22 -n kube-system -c weave-npc
– mk_sta
Nov 15 at 11:35

@MatthewLDaniel - thanks for the suggestions. I should mention I have a single node cluster. But that shouldn't trigger a problem with coredns, right? About weave: I updated my entry to include the weave container log, if that is of interest. I actually opened an issue with weave, but did not get a resolution. So I decided to try it with flannel. I ran kubeadm reset, rebooted my vm for good measure, then re-ran kubeadm init, ... kubectl apply .. this time, using flannel. Result was the same, coredns in CrashLoopBackOff with same error message.
– jm9816
Nov 15 at 17:07

Hi, you can try iptables -P FORWARD ACCEPT to bypass the WeaveNet NPC temporarily.
– Kitt Hsu
Nov 16 at 6:58

@KittHsu thanks, already have that in place # iptables-save | fgrep ':FORWARD' :FORWARD ACCEPT [0:0]
– jm9816
Nov 16 at 19:35

|
show 6 more comments

active

oldest

votes

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53309671%2fcoredns-in-crashloopbackoff-kubernetes-1-11%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

active

oldest

votes

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Cfrgtkky