I was wondering what is the difference between a system being NUMA and a system having NUMA enabled? Moreover, how can I tell if the system is NUMA inside a compiler? Does being a NUMA system depends solely on the processor inside the system? Therefore, if I have processor X, than I can tell only based on this that the systems having such processors are NUMA systems?
Excepting for Xeon Phi 200 series, systems with single CPU will only have one memory system. Xeon Phi 200 series can optionally be partitioned as 1, 2 or 4 compute nodes (MCDRAM configurable either as cache or additional memory nodes).
If your system has multiple Xeon CPUs, and are of Nehalem or later generations, then it likely is capable of being configured as multiple NUMA nodes. This is configurable as a BIOS setting (you also require a NUMA enabled O/S).
To check NUMA capability on Windows, the task manager can be inspected. If enabled, the available NUMA nodes are listed at the performance tab.
Thank you for the answer. I still have several question, please answer each if possible. I am using Windows 10 and in Task Manager -> Performance Tab I see the number of sockets equal to 1. Are those the NUMA nodes/CPUs? I also see Virtualization enabled, does this mean NUMA is enabled on my sistem? How do I know if my OS is NUMA enabled? My sistem has an Intel Core i7 processor. Can you please tell me if only Xeon processors are NUMA system? My understanding is that NUMA can be enabled if the system has several CPUs, regardless of the processors inside the system. Is that correct?
If you have 1 socket, you have 1 CPU. This one CPU has several logical processors. The logical processors are the hardware threads that the CPU is configured to run (HyperThreading is enabled or disabled as a BIOS setting). As stated earlier, excepting for KNL, the number of NUMA nodes is typically tied to the number of CPUs (sockets), as each socket may have separate memory slots adjacent to the CPU. Note, some motherboards with dual (more than 1) socket can have a single memory subsystem, thus 1 node.
NUMA nodes have nothing to do with virtualization. On a 2 socket system, each socket has its own memory subsystem. Each CPU (socket) can access its own memory subsystem as well as access the other memory subsystem(s). Access to the local memory subsystem is faster than access to the other memory subsystem(s).
Core i7 only supports 1 socket, thus only one memory node (which can have 1, 2, ... memory channels).
Xeon processors have processor numbering such as E5-1620, E5-2620, E5-4620, E5-8620), where the -n (1,2,4,8) is the maximum number of sockets that the CPU can be used in.
Now, to have multiple NUMA nodes:
1) motherboard must have multiple sockets
2) motherboard must have multiple memory subsystems (populated)
3) motherboard must have multiple CPUs (not all sockets need to be filled)
4) only memory subsystems with CPU are usable
5) BIOS must configure memory subsystems for NUMA configuration (else configure for interleaved use)
What I would like to know as well is:
1. SYSTEM_INFO data structure is used in C to get at runtime the number of logical threads of the underlying architecture (by accessing the field dwNumberOfProcessors). Is there any way to return the number of sockets or the number of CPUs/NUMA nodes? How about the number of cores per NUMA node?
2. @Sergey Kostrov, is your answer above restricted to Intel Technologies or is it a general one?
3. @Jim Dempsey, regarding Intel Xeon Phi, what other processors besides it are part of MIC Architecture?
4. @Jim Dempsey, @Sergey Kostrov, I know MIC Architecture allows programming using OpenMP. Do all Intel Xeon processors support OpenMP or only Intel Xeon Phi?
1. You can use CPUID instructions:
The above is just about everything there is to know about Intel CPUs... but may be too daunting for many readers.
was found by Googling: cpuid sockets cores
The original post seemed possibly to refer to the NUMA enabling BIOS option frequently provided on early 2-socket NUMA platforms. They were shipped with the BIOS "NUMA disabled" meaning that cache lines were stored alternately by each CPU on remote and local memory. In order to take advantage of local memory when applying affinity settings, it was necessary to turn on enable NUMA setting in BIOS.
Technically, NUMA may refer to a variety of features which don't appear to be under discussion here.
I just did some more tests on the OpenMP support in the most recent Microsoft VIsual Studio. It is still limited to the combination of OpenMP 2.0 and, in the case of C source code, the C89 standard, even though a number of C99 features have been introduced for use exclusive of OpenMP constructs. For example, for(int i=0; ....) is accepted when not preceded by #pragma omp for, but not
#pragma omp for
for(int i=0; ....
which is rejected with same error message as for( ; .....)
Reluctantly, I have changed much OpenMP code to use #if _OPENMP >= 23017 so as to not use recent features with Microsoft compiler and to conform to Microsoft subset of OpenMP elsewhere. As Microsoft OpenMP code is supported by Intel libiomp5 (and maybe by some version of the llvm), it is possible to set affinities by replacing the Microsoft library linkage.
Note that http://www.openmp.org/resources/openmp-compilers/ gives supported platforms for many present or past OpenMP implementations.
Thank you all for the answers.
I would want to pose two more questions:
A. Having the following processor:
Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
I looked at the link you provided (https://software.intel.com/sites/default/files/managed/39/c5/325462-sdm-vol-1-2abcd-3abcd.pdf) and I see that this processor can be only v2 or v3. How can I find out which one is it? What do v2 and v3 mean: are they sub architectures?
B. I am using the following OS:
Linux 4.4.0-66-generic #87-Ubuntu SMP Fri Mar 3 2017 x86_64 x86_64 x86_64 GNU/Linux
What is the ABI (Application Binary Interface) for the processor and the OS above?
I would still bet on Sandy bridge, but you could test that by seeing if ivy bridge specific code fails, or by checking for Sandy bridge specifics like very slow unaligned access. There are more differences in V3 such as support for fma.
No version specification, i.e. v0 or v1, would be expected to signify Sandy Bridge. I guess they didn't get approval to plan ahead when Sandy Bridge BIOS report was set up.