I have a pod that seems to be crashing lately, kubectl get pods
shows 16 restarts, but when I look into monitoring, all the metrics that have "restart" in their name are empty.
Do I need to explicitly turn on something so this is monitored?
To troubleshoot some crashing Pod, first you should look at its description:
$ kubectl describe pod -n ci clair-kube-7c8d8cf949-nlhv8
Containers:
clair:
[...]
State: Running
Started: Wed, 19 Aug 2020 22:06:54 +0200
Last State: Terminated
Reason: OOMKilled
Exit Code: 137
Started: Wed, 19 Aug 2020 13:07:51 +0200
Finished: Wed, 19 Aug 2020 22:06:53 +0200
Ready: True
Restart Count: 42
Here, it is quite obvious I should raise my container memory limit.
Sometimes, you may not see the reason, only an exit code. Eventually, you'ld learn to recognize them, ... At first, you would have have to look for the previous container logs:
$ kubectl logs -n ci cassandra-kube-2 -c exporter -p --tail=XX
[...]
Exception in thread "pool-1-thread-33" Exception in thread "pool-1-thread-34" java.lang.OutOfMemoryError: Java heap space
java.lang.OutOfMemoryError: Java heap space
kubectl describe <pod>
orkubectl logs <pod>
so we can look at what's happening inside the pod to troubleshoot?