10

I keep getting this error when I try to setup liveness & readiness prob for my awx_web container

Liveness probe failed: Get http://POD_IP:8052/: dial tcp POD_IP:8052: connect: connection refused

Liveness & Readiness section in my deployment for the container awx_web

          ports:
          - name: http
            containerPort: 8052 # the port of the container awx_web
            protocol: TCP
          livenessProbe:
            httpGet:
              path: /
              port: 8052
            initialDelaySeconds: 5
            periodSeconds: 5
          readinessProbe:
            httpGet:
              path: /
              port: 8052
            initialDelaySeconds: 5
            periodSeconds: 5

if I test if the port 8052 is open or not from another pod in the same namespace as the pod that contains the container awx_web or if I test using a container deployed in the same pod as the container awx_web i get this (port is open)

/ # nc -vz POD_IP 8052
POD_IP  (POD_IP :8052) open

I get the same result (port 8052 is open) if I use netcat (nc) from the worker node where pod containing the container awx_web is deployed.

for info I use a NodePort service that redirect traffic to that container (awx_web)

type: NodePort
ports:
- name: http
  port: 80
  targetPort: 8052
  nodePort: 30100
4
  • If you do curl http://POD_IP:8052/ from another pod..does it work? Commented Sep 15, 2020 at 15:25
  • From another pod, container in the same pod or from the worker node, yes it works Commented Sep 15, 2020 at 17:34
  • check kubelet and cni plugin pod logs Commented Sep 15, 2020 at 17:36
  • for kubelet log, it gave the same error Commented Sep 15, 2020 at 18:47

5 Answers 5

16

In my case this issue has occurred because I've configured the backend application host as localhost. The issue is resolved when I changed the host value to 0.0.0.0 inside my app properties.

Use the latest built docker image after making this change.

1
  • That was exactly my problem also. Changing to 0.0.0.0 fixed the problem
    – mdrobny
    Commented Jul 6, 2023 at 14:23
13

I recreated your issue and it looks like your problem is caused by too small value of initialDelaySeconds for the liveness probe.

It takes more than 5s for awx container to open 8052 port. You need to wait a bit longer for it to start. I have found out that setting it to 15s is enough for me, but you may require some tweaking.

9
  • i already incremented initialDelaySeconds to 30s then to 60s but still the same issue Commented Sep 16, 2020 at 11:57
  • With liveness probe being set, could you exec to pod as soon as it starts and run watch -n1 "ss -lnt" and check when port 8052 opens?
    – Matt
    Commented Sep 16, 2020 at 13:03
  • the container has State: Running and Ready: False, when i issue your command the port 8052 is missing from the list Commented Sep 16, 2020 at 13:31
  • Is it always missing or maybe it appears after some time? Also please check logs kubectl logs -n <namespace> <pod_name>, maybe there are some errors.@Adamsin
    – Matt
    Commented Sep 16, 2020 at 13:42
  • 2
    I deployed awx 9.3.0 and it looks like it takes awx-web container whole 5min before it opens port 8052 and starts serving traffic. This is why liveness probe is failing. Check it yourself; remove the probes, exec to the container, watch ss -lnt, and measure the time since the pods start to port 8052 is open.
    – Matt
    Commented Sep 16, 2020 at 14:55
0

Most likely your application couldnt startup or crash little after it start up . It may due to insufficient memory and cpu resource. Or one of the awx dependency not setup correctly like postgreslq & rabbit.

Did you check that if your application works correctly without probes? I recommend do that first. Examine the pods stats little bit to ensure its not restart.

1
  • i did run awx without probes and it functions perfectly Commented Sep 16, 2020 at 6:50
0

It was a resource issue for me.

Too many replicaSets were created for me. I primarily work on my dev environment, and did not need that many. Removing superfluous replica sets resolved my issue.

In other words, the pod was not able to find enough resources to start and stay alive.

0

add failurethreshold with a value more than 3 or increase initialDelaySeconds in livenessProbe and readinessProbe.

Not the answer you're looking for? Browse other questions tagged or ask your own question.