I had an interesting call with a client today. We have an application which schedules other apps, and normally we have no trouble at all with servers with NUMA configuration with anywhere from 2 to 4 nodes.
On the call, we started up two very CPU hungry applications and both were allocated to Node 0 and so across the whole machine there was only 50% usage. Once we changed the second app instance to the other node, we were using all of the cores (half on one app, half on the other). It seemed impossible to allocate an app to all cores.
Now, the only difference between this machine and the ones I'm used to using is that Windows' task manager listed the nodes in a drop-down instead of a long list of individual cores, so Microsoft knows what this restriction is, but it's a hard problem to research online.
It's pretty clear we're going to have to develop NUMA node affinity, but for now I'm trying to understand the problem. What can cause one style of NUMA machine to allow applications to use both nodes transparently, and what's causing this behaviour now?
I can see this architecture working great for many small applications, but we typically run monolithic ones with many threads.
The server I'm fighting with is an HP Proliant DL388Gen9 with two Intel Xeon E5-2690V3 CPUs.
Thoughts on what's causing this?