I left this post open, so let's close it.
In my specific case, the bottleneck actually turned out to be upstream from the K8S Pod, at the Database. The postgres instance simply wasn't big enough, and was peaking at 100% CPU usage and causing downstream timeouts.
I suspect, but am not certain, that the CPU leveling out on the Pods was simply because the Pods were waiting for response from upstream, and couldn't go about 1 CPU usage because there wasn't anything else for them to do.
Additionally, the Django instances are using Django channels and the ASGI asynchronous model, which is single threaded, and doesn't have the same "child thread" model as UWSGI; another reason -- or maybe the actual reason -- that the CPU usage on the Pod maxes out at 1CPU.
So I'm pretty sure the correct way to scale this up is to
- Vertically scale Postgres
- Increase the baseline number of Pods
- Lower the autoscaler (HPA) threshold to scale up and add new Pods
EDIT: Additional information
The issue also has to do with the way the app itself is designed. We are trying to use Django Channels asynchronous python and running htat in a Daphne ASGI container; however, not all of the app is async, and apparently, that's Bad. I did a lot of research into this async vs sync application proplem and resulting deadlocks and while I'm having the dev team redesign the app, I also redesigned the deploy
- Add a uWSGI server to the deployment
- Deploy codebase into two Deployments/Pods
- One has the ASGI Pod
- One has the uWSGI Pod
- Route all ASGI endpoints (there's only one) to the ASGI pod in the Ingress Path rules
- Route all uWSGI endpoints to the sync Pod in the Ingress Path rules
This works; however, I don't have full load testing done yet.