Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

Intel coarray hybrid program

adenchfi
Novice
436 Views

Hello,

 

Intel coarrays can be run in either distributed or shared form, which as far as I know emulates MPI vs openMP behavior under the hood. I had some questions about making programs for clusters of computers that get the advantages of hybrid MPI+openMP:

 

1) Can we achieve hybrid MPI+openMP(/pthread?) performance entirely within coarray language features in current Intel compilers? If not, is it on the roadmap?

  • For example, assigning a team to a node, where the images of each team will have a shared-memory model and thus not get communication-bottlenecked, but each team communicates between teams with MPI? (since cluster openMP isn't a thing anymore)
  • If a user can't do it explicitly (above point), does it occur under-the-hood? Does a user have any control over it then?

2) If we are limited to choosing one paradigm for coarrays, but want the hybrid performance, what is the best approach [for writing new programs]?

  • It seems to me for ease of programming, coarrays=distributed (with one image per node) + openMP(/pthread?) usage on each node is the way to go. How does one go about actually implementing this? I presume compiling with coarrays and omp/pthread works, but are there differences/limitations in the syntax? Are there simple examples out there? In slurm scripts I can assign only one process per node, and coarrays iirc will launch those available processes by default. Are there practical limitations to this approach compared to MPI+openMP/pthread?

Thanks.

0 Kudos
1 Solution
Steve_Lionel
Honored Contributor III
397 Views

Intel doesn't want you to mix OpenMP and coarrays, but you can do it if you're careful.

OpenMP inter-thread communication and synchronization is much faster than using MPI, but Intel MPI tries to do a good job of same-node communication. Typically for coarray programs you want to minimize passing data and synchronization between images. That doesn't mean eliminate it, but recognize that there is quite a bit of overhead involved and its best if an image has a lot of work to do based on initial data. Teams can help with this.

I can't speak for Intel regarding roadmaps. If I were to hazard a guess, it would be that they aren't devoting resources to the combination of coarrays and OpenMP.

View solution in original post

0 Kudos
3 Replies
Steve_Lionel
Honored Contributor III
425 Views

Intel's "shared" coarray implementation uses MPI, not OpenMP. Each image is its own process. There is no difference in syntax. I have seen quite a few examples of combining OpenMP and coarrays. The most important thing is to not over-subscribe the system. Keep in mind that OpenMP by default will start as many threads as you have cores, and isn't aware of MPI.

0 Kudos
adenchfi
Novice
414 Views

Thank you for the clarification. I have been reading the Intel MPI documentation for the last few minutes and have seen that. However, on the same line as it clarifies that, it also states OpenMP is not supported with coarrays:

adenchfi_0-1676833758445.png

Is this documentation then outdated, based on your answer?

 

I suppose my other questions come down to what the Intel MPI implementation is like. I read through https://www.intel.com/content/www/us/en/developer/articles/technical/tuning-the-intel-mpi-library-basic-techniques.html which has a lot of useful information.

However, I am still not entirely clear on the question of: do Intel coarrays (=distributed) have quicker communication when images are intra-node, comparable to OpenMP/pthreads usage? If so, is there any way for me to be able to write my program with this in mind, to minimize communication bottlenecks, purely within coarrays (and possibly Intel MPI environment variables)?

For example, I may want to do domain decomposition where different nodes are responsible for different subdomains, such that optimal memory sharing is possible on each node. After reviewing the coarrays documentation, I don't believe this is possible within the coarrays syntax alone, necessitating coarrays+OpenMP/pthreads usage. However, is this behavior on the roadmap?

0 Kudos
Steve_Lionel
Honored Contributor III
398 Views

Intel doesn't want you to mix OpenMP and coarrays, but you can do it if you're careful.

OpenMP inter-thread communication and synchronization is much faster than using MPI, but Intel MPI tries to do a good job of same-node communication. Typically for coarray programs you want to minimize passing data and synchronization between images. That doesn't mean eliminate it, but recognize that there is quite a bit of overhead involved and its best if an image has a lot of work to do based on initial data. Teams can help with this.

I can't speak for Intel regarding roadmaps. If I were to hazard a guess, it would be that they aren't devoting resources to the combination of coarrays and OpenMP.

0 Kudos
Reply