- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi All,
If the number of threads to use are 32 or any number equal or less than 64. Then, how does balance and scatter thread affinity differ from each other?
As per my analysis, these two will use same number of cores (1 thread per core). Not that balance will use 2 thread per core to keep sequential thread together, leading to only 16 cores?
Thanks.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If I recall correctly, Balanced and Scatter should be the same if you are only using one thread per core.
If you are using more than one thread per core, Balanced and Scatter will have different layouts. For example, on a 64-core part using 128 threads, Balanced will put threads 0&1 on core 0, 2&3 on core 1, etc, while Scatter will put threads 0&64 on core 0, 1&65 on core 1, etc.
Neither of these schemes is easy to understand if the number of threads is not evenly divisible by the number of cores. In such cases I find it much easier to use KMP_HW_SUBSET to force the allocation to be inside the requested subset of cores/threads. On our 68-core Xeon Phi 7250 if I wanted to use 64 cores with different numbers of threads, I would use:
- 1 thread/core: KMP_HW_SUBSET=64c,1t OMP_NUM_THREADS=64 KMP_AFFINITY=compact
- 2 threads/core: KMP_HW_SUBSET=64c,2t OMP_NUM_THREADS=128 KMP_AFFINITY=compact
- 4 threads/core: KMP_HW_SUBSET=64c,3t OMP_NUM_THREADS=256 KMP_AFFINITY=compact
These three schemes emulate what "Balanced" would do if it were run on a 64-core system.
You should always add the "verbose" clause to KMP_AFFINITY to verify that the system did what you wanted....
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
As ever, John is on the money. It is much easier to use KMP_HW_SUBSET to limit the available resources and then play with "compact" or :scatter" affinity than to try to achieve good balance with KMP_AFFINITY=balanced
One thing which John is doing, which I would not (and which has introduced a bug in his text above :-)) is that he is using OMP_NUM_THREADS as well as KMP_HW_SUBSET. I find it better not to use OMP_NUM_THREADS, since that gives you the opportunity to have a mismatch between the number of HW threads allocated and the number of software threads created. If you leave out OMP_NUM_THREADS and just use KMP_HW_SUBSET, the library's default behaviour of running one thread on each available logicalCPU will kick in, and you can't make a mistake like that in John's third line
- 4 threads/core: KMP_HW_SUBSET=64c,3t OMP_NUM_THREADS=256 KMP_AFFINITY=compact
where there 3t was intended to be 4t, and he's running 256 threads on 192 logicalCPUs...
So I'd just use
- 1 thread/core: KMP_HW_SUBSET=64c,1t KMP_AFFINITY=compact
- 2 threads/core: KMP_HW_SUBSET=64c,2t KMP_AFFINITY=compact
- 4 threads/core: KMP_HW_SUBSET=64c,4t KMP_AFFINITY=compact
and then also try KMP_AFFINITY=scatter.
This then makes it easier to experiment with scaling, simply by changing the number of cores you ask for. (as described in "How to Plot OpenMP Scaling Results").
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hurray for sharp eyes! I knew I was going to make a mistake with those numbers, and I was right!
I think one of the OMP placement directives typically gives the effect of "balanced", but the standard allows named distributions to have implementation-defined behavior, so I don't use them. Numbers work -- at least when you get them right. ;-)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi John,
I am getting a bit confused with environment variables. As I understand, to run 128 threads as Scatter, Compact and Balanced, I need following environment variables:
1) Scatter: export OMP_NUM_THREADS=128 export KMP_AFFINITY=scatter,granularity=fine
2) Compact: export OMP_NUM_THREADS=128 export KMP_AFFINITY=compact,granularity=fine
3) Balanced: export OMP_NUM_THREADS=128 export KMP_AFFINITY=balanced,granularity=fine
After reading last paragraph in the documentation here https://software.intel.com/en-us/node/522518, I am confused whether (3) is correct or not? As I understand this documentation is pointing to Intel Xeon Phi KNC not Intel Xeon Phi KNL?
To set the
balanced
affinity type for only the Intel® MIC Architecture environment, assign a specific prefix using theMIC_ENV_PREFIX=prefix
and then setprefix_KMP_AFFINITY
withbalanced
.
Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am explicitly not going to tell you how to use KMP_AFFINITY=balanced, because there is no reason to use it; as you are discovering it is hard to use and confusing.
All of the interesting options are covered by the use of KNP_HW_SUBSET and KMP_AFFINITY={scatter,compact} in a way which is comprehensible and easier to get right.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Aha! I am not the only one who can make mistakes!
s/KNP_HW_SUBSET/KMP_HW_SUBSET/
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi James, John,
It is finally getting clear to me. I got confused as I wanted to use KMP_AFFINITY=balanced as I thought without which Balanced can't be achieved.
1) Scatter 128 threads: 2 threads/core: KMP_HW_SUBSET=64c,2t KMP_AFFINITY=scatter
2) Balanced 128 threads: 2 threads/core: KMP_HW_SUBSET=64c,2t KMP_AFFINITY=compact
Why is there even KMP_AFFINITY=balanced option, it can be really confusing for new user. I expect it to do what (2) would do above, but it doesn't seem to be the case.
Thank you for clarifying this.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page