Solved: KMP_PLACE_THREADS ignored: unsupported architecture?

Alastair_M_ · ‎07-22-2014

Dear all,

I am trying to optimise the MPI/OpenMP configuration for my application and have encountered a strange problem when trying to use the KMP_PLACE_THREADS variable.

I discovered some error messages and was able to reproduce the problem with the following minimal example.

If I run the following command with any more than one MPI process (it actually completes without error with -n 1), I get the OpenMP error shown below.

mpirun -n 2 -env KMP_PLACE_THREADS=15c,4t  ./scaling

OMP: Warning #236: KMP_PLACE_THREADS ignored: unsupported architecture.
OMP: Warning #236: KMP_PLACE_THREADS ignored: unsupported architecture.

I am using MPSS 3.2.1 and ifort compiler version 14.0.2.

Any help would be greatly appreciated.

Best regards,

Alastair

TimP · ‎07-22-2014

That mpirun looks possible, if you have shut off the default pinning of your MPI.

All the relevant combinations where all ranks have the same OMP_NUM_THREADS and number of cores (spreading ranks across all available cores) should be possible using the default pinning and KMP_AFFINITY=balanced (to take care of cases where less than 4 threads per core are used).

View solution in original post

Alastair_M_ · ‎07-22-2014

To clarify something which I forgot to mention, I am running mpirun directly from my MIC with a native application.

TimP · ‎07-22-2014

Normally,it's possible to use KMP_PLACE_THREADS for a MIC native MPI run, but you must set a different offset for each rank, so that you don't pin each rank to the same group of cores. This is more applicable to the case where you are using MIC_KMP_PLACE_THREADS for multiple host ranks, each offloading to a different group of MIC cores. That case was discussed in the Jeffers, Reinders book (before the simpler KMP_PLACE_THREADS option was available).

For the case you quote (apparently using Intel MPI), it seems more appropriate to omit KMP_PLACE_THREADS and set OMP_NUM_THREADS=60, as you have left in place the default I_MPI_PIN_DOMAIN=auto which itself would choose a group of cores, in conflict with KMP_PLACE_THREADS. If you are serious about packing 60 threads into the minimum number of cores, adding OMP_PROC_BIND=close (or KMP_AFFINITY=compact) should accomplish that.

You should study that PIN_DOMAIN and note that I_MPI_PIN_DOMAIN=off is an option to allow another method to take over.

Alastair_M_ · ‎07-22-2014

Tim Prince wrote:

If you are serious about packing 60 threads into the minimum number of cores, adding OMP_PROC_BIND=close (or KMP_AFFINITY=compact) should accomplish that.

You should study that PIN_DOMAIN and note that I_MPI_PIN_DOMAIN=off is an option to allow another method to take over.

Hi Tim,

Thanks for your response. The actual example here was just a toy example that triggered the error message. I was trying to do something more like this, using an offset value for each rank.

mpirun -n 1 -env KMP_PLACE_THREADS=15c,4t,0o  ./scaling : -n 1 -env KMP_PLACE_THREADS=15c,4t,15o ./scaling

I actually want to auto generate and test a lot of combinations of MPI ranks, KMP_AFFINITY and KMP_PLACE_THREADS combinations.

The reason for my original question is the strange error message about "unsupported architecture".

Does that make more sense?

Best regards and thanks,

Alastair

TimP · ‎07-22-2014

That mpirun looks possible, if you have shut off the default pinning of your MPI.

All the relevant combinations where all ranks have the same OMP_NUM_THREADS and number of cores (spreading ranks across all available cores) should be possible using the default pinning and KMP_AFFINITY=balanced (to take care of cases where less than 4 threads per core are used).

Alastair_M_ · ‎07-23-2014

Tim Prince wrote:

That mpirun looks possible, if you have shut off the default pinning of your MPI.

All the relevant combinations where all ranks have the same OMP_NUM_THREADS and number of cores (spreading ranks across all available cores) should be possible using the default pinning and KMP_AFFINITY=balanced (to take care of cases where less than 4 threads per core are used).

Hi Tim,

It appears that setting I_MPI_PIN=off allows this mpi configuration to run successfully, thanks for your help.

I think that the original error message was misleading which hindered my efforts to figure it out.

Best regards and thanks,

Alastair

James_C_Intel2 · ‎07-23-2014

Thanks for the report. You're right that that message is misleading. I've submitted a bug report against the OpenMP runtime.