Does anyone have experience with Windows Processor Groups? I'm pondering getting a 64-core hyperthreaded (128 threads) machine to speed compute-bound OpenMP program which will happily use all the threads it can get its little hands on. Are there ways to get more than 64 threads assigned to a process in WIn 7 or 10? Are there any impediments to just running two instances of the code, one each in each of the processor groups? Memory contention issues?
I bought the 128-thread machine with Windows workstation. As expected, it breaks up the threads into two 64 thread processor groups. If I try to run a second instance of a 50-thread code, Windows tries to jam it into the node it stu8ck the first instance in. Does anyone have experience getting Intel Fortran or OpenMP to span the two nodes? I'm reluctant to try to use affinity & run two instances; for a variety of reasons, I'd rather span the nodes...i.e., Use 120 or so threads assigned to one instance of the program.
Open to ideas or experience with this problem.
Hi Bruce, I've acces to not more than 16 threads (8 physical cores), but I experienced that Windows OS maps by default (e.g. for the Intel Linpack, which comes with the compiler) all threads to the lowest logical core names, so that e.g. 8 threads run on only the first 4 physical cores. If you define kmp_affinity=scatter (e.g. in batch file
set KMP_AFFINITY=nowarnings,scatter,1,0,granularity=fine), the threads are distributed to the physical cores first. I'm no expert in these things, maybe Jim Dempsey can give better advice and explanation.
I read somewhere that Windows 10 OS behaves differently over the realeases ragarding processor groups. Most recent versions (e.g. 2004) might do a better job in scheduling. (GNU/Linux with recent kernels seems to work better with higher core count.)
Anyways, maybe the Linpack is a good thing to start to play around with kmp_affinity and maybe other needed switches (C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2020.2.254\windows\mkl\benchmarks\linpack).
Windows affinity uses a 64-bit bit mask, and on systems with more than 64 hardware threads, has a concept called processor groups, and in which case each group has its own 64-bit bit mask (which may be partially occupied.
Linux provides a bit mask, which is system build dependent, and can be of any arbitrary size.
On Windows, the (as far as I know) the OpenMP implementations do not make use of processor groups. Note, I have not verified if this is so with the current release.
If you search msdn.microsoft.com for processor groups, you should be able to find the API and (hopefully) example code. With that information (on Windows) you could:
1) Launch oversubscribed (for single processor group) application with desired affinity
2) Retrieve 64-bit bit masks for the process and for the thread (also retrieve the process group number and number of groups.
3) count bits set in process affinity bit mask (this will be for the assigned processor group)
4) If your OpenMP logical processor number is .ge. the number of set bits, using the Process Group API select a different processor group than the current processor group (this may be up or down and not necessarily contiguous).
5) reset your thread affinity to that obtained earlier
Note, you may have to repeat3:5 should you require additional processor groups to map you application.