- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
We have successfully installed the Fortran compiler and are able do compile code. The problem is we cannot seem to run de code on more than 15 processors. Our server has 56 processors available to us, but the code hangs if we try to run on more than 15.
Does any one have an idea what could we be doing wrong?
Thanks in advance.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Can you please upgrade to the current compiler release? ifort is now at 2021.6.0. The Fortran compilers are available as part of the oneAPI HPC Toolkit. You can download it here.
I suspect there are bug fixes in the last 4 years that may impact your issue.
BTW, MPIR_CVAR_CH4_OFI_ENABLE_RMA=0 only impacts distributed coarrays using IB.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
What server OS?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Oooops! Thank you for your patience!
We are running debian 11 ( bullseye) .
The code compiles fine and will run on up to (but not including) 16 processors
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Your title says errors, but the text says hangs. Could it be that your application isn't coded in such a way to scale beyond 15 images? What happens if you try a simple program such as this?
program caftest
print *, "Hello from image ", this_image()
end
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The code works fine with gfortran on up to 28 processors, it is a simple program such as de the one you mention.
The code compiles fine and will run on up to (but not including) 16 processors
Thanks for your help!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>>Our server has 56 processors available to us
>>The code works fine with gfortran on up to 28 processors
Does this mean gfortran fails using 29 or more (logical) processors?
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Also, it wouldn't hurt to run Steve's test program. If that works, then you "simple program" has an issue with the code.
Conversely, if Steve's program hangs (16 or above logical processors), then the issue is elsewhere.
Note, coarrays is implemented using MPI. The system manager can (and often do) restrict the number of processes an application can use. And this may differ between different vendors versions of MPI. If your code is written (as an example) to expect 16 processes, however the system supplies 15 processes, then poorly written code might hang.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You still haven't said what exactly goes wrong. If there is an error message, please show us the complete and exact text.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you so much for your time on this subject.
This is our code:
!
!! test.f90
!!
!! Made by (Carlos Calcaneo Roldan)
!! Login <calcaneo@acf01>
!!
!! Started on Mon Aug 1 12:42:15 2022 Carlos Calcaneo Roldan
!! Last update Time-stamp: <01-ago-2022 12:42:43 calcaneo>
!
program caftest
print *, "Hello from image ", this_image()
end program caftest
An this is how we compile:
ifort -coarray=distributed -coarray-num-images=8 test.f90 -o test (eg for 8 processors).
I am attaching the result, when we use more than 15 processors the program does not respond ans we have to make a hard break.
Thank you very much for your time.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Try setting the (an) environment variable I_MPI_DEBUG=5
Then run the program (with more than 15 processes).
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This is what I get when I do that
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
What version of the Intel Fortran compiler are you using?
If you are running on a single server, you can use -coarray=shared. Does that work for more than 15 processes?
What MPI fabric are you using? There is a known bug with OFI/mlx over IB and using distributed coarrays . As a workaround, try setting this environment variable: MPIR_CVAR_CH4_OFI_ENABLE_RMA=0. Another workaround is to use OFI/psm3.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sorry, I forgot to mention, we are using Intel parallel studio 2017
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Can you please upgrade to the current compiler release? ifort is now at 2021.6.0. The Fortran compilers are available as part of the oneAPI HPC Toolkit. You can download it here.
I suspect there are bug fixes in the last 4 years that may impact your issue.
BTW, MPIR_CVAR_CH4_OFI_ENABLE_RMA=0 only impacts distributed coarrays using IB.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you so much Barbara!
We can know play with this compiler!!! We have succeeded in installing and compiling the "hello world" code, so now the work begins! (please see image)
I cannot express how much I appreciate your time, you just helps us immensely.
Hope you have a wonderful day!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thanks for mentioning the fix for the coarray fortran MLX over IB bug - I am currently trying to do this and tried both bug fixes you recommended but I still cannot get it working. I am using intel-oneapi-compilers/2022.0.2 and intel-oneapi-mpi/2021.4.0.
UCX version 1.12.1 shows the following transports available:
# Transport: posix
# Transport: sysv
# Transport: self
# Transport: tcp
# Transport: tcp
# Transport: tcp
# Transport: rc_verbs
# Transport: rc_mlx5
# Transport: dc_mlx5
# Transport: ud_verbs
# Transport: ud_mlx5
# Transport: cma
However, when I set export I_MPI_OFI_PROVIDER=mlx I don't get anywhere. Do you know of any other fixes for using distributed coarrays over mlx?
Thanks!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Can you please install the latest compilers that are part of oneAPI 2023.0 that was released in December 2023? Then compile and run again.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
GOOD NEWS!! But please use ifort for now. It looks like you might have used ifx.
Be aware that ifx has limited co-array support; we sneaked it in there.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Ooops. Thanks for the heads up the reality is that we are exploring still. But now ate least we know the compiler is working.
Thanks again!

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page