Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

Run time erros for fortran coarrays

calcaneo
Beginner
2,311 Views

We have successfully installed the Fortran compiler and are able do compile code. The problem is we cannot seem to run de code on more than 15 processors. Our server has 56 processors available to us, but the code hangs if we try to run on more than 15.

 

Does any one have an idea what could we be doing wrong?

 

Thanks in advance.

0 Kudos
1 Solution
Barbara_P_Intel
Employee
2,167 Views

Can you please upgrade to the current compiler release? ifort is now at 2021.6.0. The Fortran compilers are available as part of the oneAPI HPC Toolkit. You can download it here.

I suspect there are bug fixes in the last 4 years that may impact your issue.

BTW, MPIR_CVAR_CH4_OFI_ENABLE_RMA=0 only impacts distributed coarrays using IB.

 

View solution in original post

0 Kudos
19 Replies
MWind2
New Contributor III
2,301 Views

What server OS?

 

0 Kudos
calcaneo
Beginner
2,269 Views

Oooops! Thank you for your patience!

We are running debian 11 ( bullseye) .

The code compiles fine and will run on up to (but not including) 16 processors

0 Kudos
Steve_Lionel
Honored Contributor III
2,282 Views

Your title says errors, but the text says hangs. Could it be that your application isn't coded in such a way to scale beyond 15 images? What happens if you try a simple program such as this?

program caftest
print *, "Hello from image ", this_image()
end

0 Kudos
calcaneo
Beginner
2,269 Views

The code works fine with gfortran on up to 28 processors, it is a simple program such as de the one you mention.

The code compiles fine and will run on up to (but not including) 16 processors

Thanks for your help!

 

0 Kudos
jimdempseyatthecove
Honored Contributor III
2,257 Views

>>Our server has 56 processors available to us

>>The code works fine with gfortran on up to 28 processors

Does this mean gfortran fails using 29 or more (logical) processors?

 

Jim Dempsey

0 Kudos
jimdempseyatthecove
Honored Contributor III
2,256 Views

Also, it wouldn't hurt to run Steve's test program. If that works, then you "simple program" has an issue with the code.

Conversely, if Steve's program hangs (16 or above logical processors), then the issue is elsewhere.

 

Note, coarrays is implemented using MPI. The system manager can (and often do) restrict the number of processes an application can use. And this may differ between different vendors versions of MPI. If your code is written (as an example) to expect 16 processes, however the system supplies 15 processes, then poorly written code might hang.

 

Jim Dempsey

0 Kudos
Steve_Lionel
Honored Contributor III
2,243 Views

You still haven't said what exactly goes wrong. If there is an error message, please show us the complete and exact text.

0 Kudos
calcaneo
Beginner
2,219 Views

Thank you so much for your time on this subject.

 

This is our code:

!
!! test.f90
!!
!! Made by (Carlos Calcaneo Roldan)
!! Login <calcaneo@acf01>
!!
!! Started on Mon Aug 1 12:42:15 2022 Carlos Calcaneo Roldan
!! Last update Time-stamp: <01-ago-2022 12:42:43 calcaneo>
!

program caftest
print *, "Hello from image ", this_image()
end program caftest

 

 

An this is how we compile:

ifort -coarray=distributed -coarray-num-images=8 test.f90 -o test  (eg for 8 processors).

I am attaching the result, when we use more than 15 processors the program does not respond ans we have to make a hard break.

Thank you very much for your time.

 

 

0 Kudos
jimdempseyatthecove
Honored Contributor III
2,215 Views

Try setting the (an) environment variable I_MPI_DEBUG=5

Then run the program (with more than 15 processes).

 

Jim Dempsey

0 Kudos
calcaneo
Beginner
2,185 Views
0 Kudos
Barbara_P_Intel
Employee
2,208 Views

What version of the Intel Fortran compiler are you using?

If you are running on a single server, you can use -coarray=shared. Does that work for more than 15 processes?

What MPI fabric are you using? There is a known bug with OFI/mlx over IB and using distributed coarrays . As a workaround, try setting this environment variable:  MPIR_CVAR_CH4_OFI_ENABLE_RMA=0. Another workaround is to use OFI/psm3.

 

 

0 Kudos
calcaneo
Beginner
2,184 Views
0 Kudos
calcaneo
Beginner
2,174 Views

Sorry, I forgot to mention, we are using Intel parallel studio 2017

0 Kudos
Barbara_P_Intel
Employee
2,168 Views

Can you please upgrade to the current compiler release? ifort is now at 2021.6.0. The Fortran compilers are available as part of the oneAPI HPC Toolkit. You can download it here.

I suspect there are bug fixes in the last 4 years that may impact your issue.

BTW, MPIR_CVAR_CH4_OFI_ENABLE_RMA=0 only impacts distributed coarrays using IB.

 

0 Kudos
calcaneo
Beginner
2,164 Views

Thank you so much Barbara!

 

We can know play with this compiler!!! We have succeeded in installing and compiling the "hello world" code, so now the work begins! (please see image)

 

I cannot express how much I appreciate your time, you just helps us immensely.

 

Hope you have a wonderful day! Screenshot from 2022-08-02 10-49-21.png

0 Kudos
as14
Beginner
1,622 Views

Hi,

 

Thanks for mentioning the fix for the coarray fortran MLX over IB bug - I am currently trying to do this and tried both bug fixes you recommended but I still cannot get it working. I am using intel-oneapi-compilers/2022.0.2 and intel-oneapi-mpi/2021.4.0. 

UCX version 1.12.1 shows the following transports available:

# Transport: posix
# Transport: sysv
# Transport: self
# Transport: tcp
# Transport: tcp
# Transport: tcp
# Transport: rc_verbs
# Transport: rc_mlx5
# Transport: dc_mlx5
# Transport: ud_verbs
# Transport: ud_mlx5
# Transport: cma

However, when I set export I_MPI_OFI_PROVIDER=mlx I don't get anywhere. Do you know of any other fixes for using distributed coarrays over mlx?

Thanks!

0 Kudos
Barbara_P_Intel
Employee
1,567 Views

Can you please install the latest compilers that are part of oneAPI 2023.0 that was released in December 2023? Then compile and run again.

 

0 Kudos
Barbara_P_Intel
Employee
2,161 Views

GOOD NEWS!!  But please use ifort for now. It looks like you might have used ifx.

Be aware that ifx has limited co-array support; we sneaked it in there. With the next release co-array support is planned to be complete and official. See this article for information about the Fortran and OpenMP implementations in ifx available today.

0 Kudos
calcaneo
Beginner
2,154 Views

Ooops. Thanks for the heads up the reality is that we are exploring still. But now ate least we know the compiler is working.

 

Thanks again!

 

0 Kudos
Reply