Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
28456 Discussions

KMP_AFFINITY: affinity only supported for Intel(R) processors.

mambru37
Beginner
3,832 Views

Is there any specific reason not to support the KMP_AFFINITY option on non-Intel systems, or is it like the -xP option, that can be made available just by bypassing the check for the GenuineIntel?

 

0 Kudos
5 Replies
TimP
Honored Contributor III
3,830 Views
In our experience, KMP_AFFINITY worked as well on non-Intel platforms. Obviously, there are pitfalls; as an example, if you set affinity by the numbers, it will work differently on platforms which have different BIOS numbering schemes. It's certainly desirable to have a recent enough OS that KMP_AFFINITY isn't needed for the most common situations.
If you are using a compiler so old that it doesn't support -xO, you may be going back far enough that KMP_AFFINITY isn't supported.
0 Kudos
mambru37
Beginner
3,830 Views
Glad to hear it works. What are the requisites then? I have Linux 2.6.24 and ifort 10.1. My machine is properly configured (or so I think).

But all I get from running my program compiled with -O2 -xO -openmp with KMP_AFFINITY="verbose,granularity=fine,scatter" is:

OMP warning: KMP_AFFINITY: affinity only supported for Intel processors.
OMP warning: KMP_AFFINITY: affinity not supported, using 'none'
KMP_AFFINITY: Affinity not capable, using local cpuid instr info
KMP_AFFINITY: 16 available OS procs - Uniform topology of
KMP_AFFINITY: 16 packages x 1 cores/pkg x 1 threads/core (16 total cores)

And from numastat I can see how numa_miss is increasing on node 1 while numa_foreign equally does on node 0 (meaning node 1 is allocating memory at node 0.

I attach the output of numactl --hardware:

$ numactl --hardware
available: 8 nodes (0-7)
node 0 cpus: 0 1
node 0 size: 4095 MB
node 0 free: 2782 MB
node 1 cpus: 2 3
node 1 size: 4096 MB
node 1 free: 3211 MB
node 2 cpus: 4 5
node 2 size: 4096 MB
node 2 free: 3382 MB
node 3 cpus: 6 7
node 3 size: 4096 MB
node 3 free: 3324 MB
node 4 cpus: 8 9
node 4 size: 4096 MB
node 4 free: 3419 MB
node 5 cpus: 10 11
node 5 size: 4096 MB
node 5 free: 3458 MB
node 6 cpus: 12 13
node 6 size: 4096 MB
node 6 free: 3396 MB
node 7 cpus: 14 15
node 7 size: 4096 MB
node 7 free: 2923 MB
node distances:
node 0 1 2 3 4 5 6 7
0: 10 20 20 20 20 20 20 20
1: 20 10 20 20 20 20 20 20
2: 20 20 10 20 20 20 20 20
3: 20 20 20 10 20 20 20 20
4: 20 20 20 20 10 20 20 20
5: 20 20 20 20 20 10 20 20
6: 20 20 20 20 20 20 10 20
7: 20 20 20 20 20 20 20 10

0 Kudos
TimP
Honored Contributor III
3,830 Views
I've never seen a machine like that, and KMP_AFFINITY seems not to understand it. To me, the warning simply confirms that the topology is not diagnosed correctly, and bug reports would likely not be accepted. You might try using KMP_AFFINITY "by the numbers." For completeness, try the same thing with taskset, for example.
My colleagues and I haven't seen any advantage to using "fine." Scatter, if your machine were recognized as 8 dual core CPUs (is that for real?) would assign each of 8 threads to a different CPU.
0 Kudos
mambru37
Beginner
3,830 Views
It's an 8-way Opteron machine.

Taskset is working fine, I can force any process to run on a given processor, and force the migration of already running processes, but it doesn't allow thread-level granularity. I want each thread to be bound to a cpu (core). The aim is to avoid threads allocating memory non locally (what is happening now).

From the output of the program, I guess that when it says:

OMP warning: KMP_AFFINITY: affinity not supported, using 'none'
That means that it no matter what value I give to KMP_AFFINITY, it will set it to none and ignore it.

Maybe it's a misconfiguration of the machine after all, but it annoys me that a feature is only available for Intel processors, that the compiler libraries are actually checking what is the brand of my CPU and forbidding me from using a feature.
0 Kudos
TimP
Honored Contributor III
3,830 Views
Even if KMP_AFFINITY mis-diagnoses your machine as 16 single core CPUs, you should be able to control locality either by KMP_AFFINITY or taskset. In either case, locality of memory depends, as you indicate, on threads remaining scheduled consistent with the way the memory was first allocated. KMP_AFFINITY couldn't do this any better than taskset, except for the possible convenience factor.
0 Kudos
Reply