- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear all,
I am trying to optimise the MPI/OpenMP configuration for my application and have encountered a strange problem when trying to use the KMP_PLACE_THREADS variable.
I discovered some error messages and was able to reproduce the problem with the following minimal example.
If I run the following command with any more than one MPI process (it actually completes without error with -n 1), I get the OpenMP error shown below.
mpirun -n 2 -env KMP_PLACE_THREADS=15c,4t ./scaling OMP: Warning #236: KMP_PLACE_THREADS ignored: unsupported architecture. OMP: Warning #236: KMP_PLACE_THREADS ignored: unsupported architecture.
I am using MPSS 3.2.1 and ifort compiler version 14.0.2.
Any help would be greatly appreciated.
Best regards,
Alastair
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
That mpirun looks possible, if you have shut off the default pinning of your MPI.
All the relevant combinations where all ranks have the same OMP_NUM_THREADS and number of cores (spreading ranks across all available cores) should be possible using the default pinning and KMP_AFFINITY=balanced (to take care of cases where less than 4 threads per core are used).
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
To clarify something which I forgot to mention, I am running mpirun directly from my MIC with a native application.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Normally,it's possible to use KMP_PLACE_THREADS for a MIC native MPI run, but you must set a different offset for each rank, so that you don't pin each rank to the same group of cores. This is more applicable to the case where you are using MIC_KMP_PLACE_THREADS for multiple host ranks, each offloading to a different group of MIC cores. That case was discussed in the Jeffers, Reinders book (before the simpler KMP_PLACE_THREADS option was available).
For the case you quote (apparently using Intel MPI), it seems more appropriate to omit KMP_PLACE_THREADS and set OMP_NUM_THREADS=60, as you have left in place the default I_MPI_PIN_DOMAIN=auto which itself would choose a group of cores, in conflict with KMP_PLACE_THREADS. If you are serious about packing 60 threads into the minimum number of cores, adding OMP_PROC_BIND=close (or KMP_AFFINITY=compact) should accomplish that.
You should study that PIN_DOMAIN and note that I_MPI_PIN_DOMAIN=off is an option to allow another method to take over.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Tim Prince wrote:
If you are serious about packing 60 threads into the minimum number of cores, adding OMP_PROC_BIND=close (or KMP_AFFINITY=compact) should accomplish that.
You should study that PIN_DOMAIN and note that I_MPI_PIN_DOMAIN=off is an option to allow another method to take over.
Hi Tim,
Thanks for your response. The actual example here was just a toy example that triggered the error message. I was trying to do something more like this, using an offset value for each rank.
mpirun -n 1 -env KMP_PLACE_THREADS=15c,4t,0o ./scaling : -n 1 -env KMP_PLACE_THREADS=15c,4t,15o ./scaling
I actually want to auto generate and test a lot of combinations of MPI ranks, KMP_AFFINITY and KMP_PLACE_THREADS combinations.
The reason for my original question is the strange error message about "unsupported architecture".
Does that make more sense?
Best regards and thanks,
Alastair
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
That mpirun looks possible, if you have shut off the default pinning of your MPI.
All the relevant combinations where all ranks have the same OMP_NUM_THREADS and number of cores (spreading ranks across all available cores) should be possible using the default pinning and KMP_AFFINITY=balanced (to take care of cases where less than 4 threads per core are used).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Tim Prince wrote:
That mpirun looks possible, if you have shut off the default pinning of your MPI.
All the relevant combinations where all ranks have the same OMP_NUM_THREADS and number of cores (spreading ranks across all available cores) should be possible using the default pinning and KMP_AFFINITY=balanced (to take care of cases where less than 4 threads per core are used).
Hi Tim,
It appears that setting I_MPI_PIN=off allows this mpi configuration to run successfully, thanks for your help.
I think that the original error message was misleading which hindered my efforts to figure it out.
Best regards and thanks,
Alastair
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for the report. You're right that that message is misleading. I've submitted a bug report against the OpenMP runtime.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page