Thread affinities are tied to 64-bit numbers. So processor virtual cores are lumped into groups of at most 64. Threads can only be assigned to a single group at a time, and by default all threads in a process are locked to one group.
Well, each thread being only able to be scheduled on some of 64 cores hasn't been a huge issue so far. Usually you want your threads to stay in the same NUMA region anyways because cross socket communication is expensive.
Annoying if a processor group spans two NUMA regions leaving just a few processors to other side...
Well, I wouldn't advocate using Windows in such a setting...
I/O layer overhead in Windows is considerable. As any Windows kernel driver developer knows, passing IRPs (I/O request packets) through a long stack does not come for free. Not just drivers for filesystems, networking stacks, etc. and devices, but there are usually also filter drivers. IRPs go through the stack and then bubble back up.
Starting threads and especially processes is also rather sluggish. As is opening files.
There's no completely unified I/O API in Windows. You can't consider SOCKETs (partially userland objects!) as I/O subsystem HANDLEs in all scenarios and anonymous pipes for process stdin/stdout redirection are always synchronous (no "overlapped I/O" or IOCP possible).
For compute Windows is fine, all this overhead doesn't matter much. But I don't understand why some insist using Windows as a server.
But when someone pays me for making Windows dance, so be it. :-) You can usually work around most issues with some creativity and a lot of working hours.
The arguments I’ve heard are IO completion ports are less “brain dead” then epoll or select/poll and visual studio is a great IDE. Otherwise I’m not sure either.
IOCP is great, just annoying you can't use it with process stdin/stdout/stderr HANDLEs, at least if you do things "by the book". Thick driver sandwiches in Windows... not so great.
Visual Studio a great IDE... well, the debugger isn't amazing unless you're dealing with a self-contained bug (often find myself using windbg instead). Basic operations (typing, moving caret, changing tab, find, etc.) are slow at least on my Xeon gold workstation.
Can you elaborate? I haven't noticed any particular "wonkiness" happening?