Software Archive
Read-only legacy content
17061 Discussions

KMP_PLACE_THREADS ignored: unsupported architecture?

Alastair_M_
New Contributor I
712 Views

Dear all,

I am trying to optimise the MPI/OpenMP configuration for my application and have encountered a strange problem when trying to use the KMP_PLACE_THREADS variable.

I discovered some error messages and was able to reproduce the problem with the following minimal example.

If I run the following command with any more than one MPI process (it actually completes without error with -n 1), I get the OpenMP error shown below.

mpirun -n 2 -env KMP_PLACE_THREADS=15c,4t  ./scaling

OMP: Warning #236: KMP_PLACE_THREADS ignored: unsupported architecture.
OMP: Warning #236: KMP_PLACE_THREADS ignored: unsupported architecture.

I am using MPSS 3.2.1 and ifort compiler version 14.0.2.

Any help would be greatly appreciated. 

Best regards,

Alastair

0 Kudos
1 Solution
TimP
Honored Contributor III
712 Views

That mpirun looks possible, if you have shut off the default pinning of your MPI.

All the relevant combinations where all ranks have the same OMP_NUM_THREADS and number of cores (spreading ranks across all available cores) should be possible using the default pinning and KMP_AFFINITY=balanced (to take care of cases where less than 4 threads per core are used).

View solution in original post

0 Kudos
6 Replies
Alastair_M_
New Contributor I
712 Views

To clarify something which I forgot to mention, I am running mpirun directly from my MIC with a native application.

 

0 Kudos
TimP
Honored Contributor III
712 Views

Normally,it's possible to use KMP_PLACE_THREADS for a MIC native MPI run, but you must set a different offset for each rank, so that you don't pin each rank to the same group of cores.  This is more applicable to the case where you are using MIC_KMP_PLACE_THREADS for multiple host ranks, each offloading to a different group of MIC cores.  That case was discussed in the Jeffers, Reinders book (before the simpler KMP_PLACE_THREADS option was available).

For the case you quote (apparently using Intel MPI), it seems more appropriate to omit KMP_PLACE_THREADS and set OMP_NUM_THREADS=60, as you have left in place the default I_MPI_PIN_DOMAIN=auto which itself would choose a group of cores, in conflict with KMP_PLACE_THREADS.  If you are serious about packing 60 threads into the minimum number of cores, adding OMP_PROC_BIND=close (or KMP_AFFINITY=compact) should accomplish that.

You should study that PIN_DOMAIN and note that I_MPI_PIN_DOMAIN=off is an option to allow another method to take over.

0 Kudos
Alastair_M_
New Contributor I
712 Views

Tim Prince wrote:

 If you are serious about packing 60 threads into the minimum number of cores, adding OMP_PROC_BIND=close (or KMP_AFFINITY=compact) should accomplish that.

You should study that PIN_DOMAIN and note that I_MPI_PIN_DOMAIN=off is an option to allow another method to take over.

Hi Tim,

Thanks for your response.  The actual example here was just a toy example that triggered the error message.  I was trying to do something more like this, using an offset value for each rank.  

mpirun -n 1 -env KMP_PLACE_THREADS=15c,4t,0o  ./scaling : -n 1 -env KMP_PLACE_THREADS=15c,4t,15o ./scaling 

I actually want to auto generate and test a lot of combinations of MPI ranks, KMP_AFFINITY and KMP_PLACE_THREADS combinations.

The reason for my original question is the strange error message about "unsupported architecture".

Does that make more sense?

Best regards and thanks,

Alastair

 

0 Kudos
TimP
Honored Contributor III
713 Views

That mpirun looks possible, if you have shut off the default pinning of your MPI.

All the relevant combinations where all ranks have the same OMP_NUM_THREADS and number of cores (spreading ranks across all available cores) should be possible using the default pinning and KMP_AFFINITY=balanced (to take care of cases where less than 4 threads per core are used).

0 Kudos
Alastair_M_
New Contributor I
712 Views

Tim Prince wrote:

That mpirun looks possible, if you have shut off the default pinning of your MPI.

All the relevant combinations where all ranks have the same OMP_NUM_THREADS and number of cores (spreading ranks across all available cores) should be possible using the default pinning and KMP_AFFINITY=balanced (to take care of cases where less than 4 threads per core are used).

Hi Tim,

It appears that setting I_MPI_PIN=off allows this mpi configuration to run successfully, thanks for your help.

I think that the original error message was misleading which hindered my efforts to figure it out.

Best regards and thanks,

Alastair

0 Kudos
James_C_Intel2
Employee
712 Views

Thanks for the report. You're right that that message is misleading. I've submitted a bug report against the OpenMP runtime.

0 Kudos
Reply