Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
New Contributor III
288 Views

Implementation of OpenMP Scatter, Compact and Balanced Mode

Jump to solution

Hi All,

If I am correct, then OpenMP thread distribution modes like scatter, compact and balanced are implemented specifically for Xeon Phi and aren't supported in general by OpenMP library.

Is there any documentation that I can refer to understand how this is implemented? In other words, what part of the code and of which library or software is called when we set environment variables like: KMP_AFFINITY=<compact/scatter/balanced>?

Thanks. 

Chetan Arvind Patil
0 Kudos
22 Replies
Highlighted
Beginner
78 Views

jimdempseyatthecove (Blackbelt) wrote:

John>>It is very easy to get a large performance degradation if you attempt to use all the logical processors in the system for a single parallel user job.

This depends on the application. For the vast majority of applications I would concur that less than 4 threads per core would be optimal. I suggest that you test with 4 threads per core as well as you may have one of those minority applications that benefit using all 4 threads per core. As it so happens I am working a simulation program that is a mix of OpenMP and MPI which does perform better using all 4 threads per core. This application manipulates many instances of smaller collections of small arrays (~4x4, 6x6, where on KNL of smabad night I convert 6x6 to 6x8, small arrays use in matrix multiply). This application also uses OpenMP Tasks as opposed to OpenMP DO (for) loops. So maybe OpenMP task-based programs with relatively short loops of half-way vectorizable data might benefit from all HT's.

A typical simulation run on 1 KNL (7210) takes 26 hours, 39 minutes) to run 8 years of simulation time. This application scaled well on up to 8 KNL nodes on the Colfax Cluster. The 16 and 32 nodes tests couldn't get availability of nodes on their cluster so I do not know how well it scales beyond 8 nodes. The initial production system may have 12 or 16 nodes of KNLs.

Jim Dempsey



It is very easy to get a large performance degradation if you attempt to use all the logical processors in the system for a single parallel user job.Thank you This will be much helping for me..! 

78 Views

Chetan Arvind Patil wrote:

Hi John,

I think my question is not regarding this.

I give compact enough resources to run 8 threads. In first case I give it 64 (I am not turning any cores off via sysfs before running benchmark in compact mode) cores to spawn 8 threads, so compact will fill core 0 and 1 with all 8 threads.

In second case I am giving compact mode 2 cores (i.e. 4 threads per core == 8 threads AND I am turning 62 cores off and only 2 cores are online via sysfs before running the benchmark in compact mode),  so it should fill all these cores with threads task.

Difference is number of cores that are online (via sysfs). So, is it that compact mode calculates how to map threads based on how many cores are online via sysfs kerala? OR is it that OS task apart from the benchmark running are affecting this due to less number of physical cores available to schedule task?

I don't see performance degradation when I do similar analysis for scatter mode.

Thanks.

It is very easy to get a large performance degradation if you attempt to use all the logical processors in the system for a single parallel user job.

0 Kudos
Highlighted
Beginner
78 Views

Hi John,

I think my question is not regarding this.

I give compact enough resources to run 8 threads. In first case I give it 64 (I am not turning any cores off via sysfs before running benchmark in compact mode) cores to spawn 8 threads, so compact will fill core 0 and 1 with all 8 threads.

In second case I am giving compact mode 2 cores (i.e. 4 threads per core == 8 threads AND I am turning 62 cores off and only 2 cores are online via sysfs before running the benchmark in compact mode),  so it should fill all these cores with threads task.

Difference is number of cores that are online (via sysfs). So, is it that compact mode calculates how to map threads based on how many cores are online via sysfs kerala? OR is it that OS task apart from the benchmark running are affecting this due to less number of physical cores available to schedule task?

I don't see performance degradation when I do similar analysis for scatter mode.

Thanks.

0 Kudos