Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.

Various questions

unrue
Beginner
582 Views
Dear Intel HPC forum, i have some questions about Intel MPI library:

1) In page number 109 of reference manual, is written: " There is no support for C and C++ applications". What does it means exactly?

2) What is Cache Bypass algorithm defined in I_MPI_SHM_CACHE_BYPASS enviroment variable? ( page 69 )

3) In page 90( Dynamic proces support", it means i can attach an MPi process to another MPi processi in running?

4) Whe i use MPI_PIN_PROCESSOR_LIST, what does means? all, allcores, and allsocks are for me not clear. Someone can explain me better with an example?

Thanks in advance!
0 Kudos
3 Replies
TimP
Honored Contributor III
582 Views
Quoting unrue
Dear Intel HPC forum, i have some questions about Intel MPI library:

1) In page number 109 of reference manual, is written: " There is no support for C and C++ applications". What does it means exactly?

2) What is Cache Bypass algorithm defined in I_MPI_SHM_CACHE_BYPASS enviroment variable? ( page 69 )

3) In page 90( Dynamic proces support", it means i can attach an MPi process to another MPi processi in running?

4) Whe i use MPI_PIN_PROCESSOR_LIST, what does means? all, allcores, and allsocks are for me not clear. Someone can explain me better with an example?

Thanks in advance!

1) As this comment is in the section about ilp64 (64-bit integer) support, and follows the statement saying there is no f90 USE file for this, I have to assume it means there are no include files for long int.

2) The default threshold isn't documented, but presumably, by default, messages passed by shm within the originating node, which exceed some size threshold, use nontemporal store so that they do not evict all or most of the data in cache. It's easy to imagine situations where you might want a message to reside in the destination cache for immediate use, or where you might want nontemporal to apply to smaller messages than the default threshold. You would have to set up some baseline performance case to evaluate whether changes from the defaults are useful for your application.

3) I can't add to public descriptions of this feature. I haven't seen it used.

4) As far as I can tell, allcores is meant to facilitate use of 1 logical processor per core (when HT is enabled), while "all" makes all the logical processors available. allsocks may be intended to help distribute a smaller number of MPI processes across multiple sockets/packages. I agree that the description ought to be clarified.

0 Kudos
unrue
Beginner
582 Views
Hi tim18,
thanks for you reply.

For example, if i use I_MPI_PIN_PROCESSOR_LIST=allsocks and suppose i have 2 sockets in a node and 4 MPI process. It means the MPI process will be distributed rank 0 and rank 1 in a first node ( rank 0 in a 1 sockes and rank 1 in a second socket) and the same for rank 2 and 3 but in the second node?

0 Kudos
TimP
Honored Contributor III
582 Views
I don't believe allsocks is the right choice for this case, but you'd want to check by setting I_MPI_DEBUG=5 or higher. If you have at least 2 cores per socket, allcores should work. If you don't have HT enabled, allcores will not change behavior from default.
0 Kudos
Reply