Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

Run time erros for fortran coarrays

calcaneo
초급자
3,878 조회수

We have successfully installed the Fortran compiler and are able do compile code. The problem is we cannot seem to run de code on more than 15 processors. Our server has 56 processors available to us, but the code hangs if we try to run on more than 15.

 

Does any one have an idea what could we be doing wrong?

 

Thanks in advance.

0 포인트
1 솔루션
Barbara_P_Intel
3,734 조회수

Can you please upgrade to the current compiler release? ifort is now at 2021.6.0. The Fortran compilers are available as part of the oneAPI HPC Toolkit. You can download it here.

I suspect there are bug fixes in the last 4 years that may impact your issue.

BTW, MPIR_CVAR_CH4_OFI_ENABLE_RMA=0 only impacts distributed coarrays using IB.

 

원본 게시물의 솔루션 보기

0 포인트
19 응답
MWind2
새로운 기여자 III
3,868 조회수

What server OS?

 

0 포인트
calcaneo
초급자
3,836 조회수

Oooops! Thank you for your patience!

We are running debian 11 ( bullseye) .

The code compiles fine and will run on up to (but not including) 16 processors

0 포인트
Steve_Lionel
명예로운 기여자 III
3,849 조회수

Your title says errors, but the text says hangs. Could it be that your application isn't coded in such a way to scale beyond 15 images? What happens if you try a simple program such as this?

program caftest
print *, "Hello from image ", this_image()
end

0 포인트
calcaneo
초급자
3,836 조회수

The code works fine with gfortran on up to 28 processors, it is a simple program such as de the one you mention.

The code compiles fine and will run on up to (but not including) 16 processors

Thanks for your help!

 

0 포인트
jimdempseyatthecove
명예로운 기여자 III
3,824 조회수

>>Our server has 56 processors available to us

>>The code works fine with gfortran on up to 28 processors

Does this mean gfortran fails using 29 or more (logical) processors?

 

Jim Dempsey

0 포인트
jimdempseyatthecove
명예로운 기여자 III
3,823 조회수

Also, it wouldn't hurt to run Steve's test program. If that works, then you "simple program" has an issue with the code.

Conversely, if Steve's program hangs (16 or above logical processors), then the issue is elsewhere.

 

Note, coarrays is implemented using MPI. The system manager can (and often do) restrict the number of processes an application can use. And this may differ between different vendors versions of MPI. If your code is written (as an example) to expect 16 processes, however the system supplies 15 processes, then poorly written code might hang.

 

Jim Dempsey

0 포인트
Steve_Lionel
명예로운 기여자 III
3,810 조회수

You still haven't said what exactly goes wrong. If there is an error message, please show us the complete and exact text.

0 포인트
calcaneo
초급자
3,786 조회수

Thank you so much for your time on this subject.

 

This is our code:

!
!! test.f90
!!
!! Made by (Carlos Calcaneo Roldan)
!! Login <calcaneo@acf01>
!!
!! Started on Mon Aug 1 12:42:15 2022 Carlos Calcaneo Roldan
!! Last update Time-stamp: <01-ago-2022 12:42:43 calcaneo>
!

program caftest
print *, "Hello from image ", this_image()
end program caftest

 

 

An this is how we compile:

ifort -coarray=distributed -coarray-num-images=8 test.f90 -o test  (eg for 8 processors).

I am attaching the result, when we use more than 15 processors the program does not respond ans we have to make a hard break.

Thank you very much for your time.

 

 

0 포인트
jimdempseyatthecove
명예로운 기여자 III
3,782 조회수

Try setting the (an) environment variable I_MPI_DEBUG=5

Then run the program (with more than 15 processes).

 

Jim Dempsey

0 포인트
calcaneo
초급자
3,752 조회수
0 포인트
Barbara_P_Intel
3,775 조회수

What version of the Intel Fortran compiler are you using?

If you are running on a single server, you can use -coarray=shared. Does that work for more than 15 processes?

What MPI fabric are you using? There is a known bug with OFI/mlx over IB and using distributed coarrays . As a workaround, try setting this environment variable:  MPIR_CVAR_CH4_OFI_ENABLE_RMA=0. Another workaround is to use OFI/psm3.

 

 

0 포인트
calcaneo
초급자
3,751 조회수
0 포인트
calcaneo
초급자
3,741 조회수

Sorry, I forgot to mention, we are using Intel parallel studio 2017

0 포인트
Barbara_P_Intel
3,735 조회수

Can you please upgrade to the current compiler release? ifort is now at 2021.6.0. The Fortran compilers are available as part of the oneAPI HPC Toolkit. You can download it here.

I suspect there are bug fixes in the last 4 years that may impact your issue.

BTW, MPIR_CVAR_CH4_OFI_ENABLE_RMA=0 only impacts distributed coarrays using IB.

 

0 포인트
calcaneo
초급자
3,731 조회수

Thank you so much Barbara!

 

We can know play with this compiler!!! We have succeeded in installing and compiling the "hello world" code, so now the work begins! (please see image)

 

I cannot express how much I appreciate your time, you just helps us immensely.

 

Hope you have a wonderful day! Screenshot from 2022-08-02 10-49-21.png

0 포인트
as14
초급자
3,189 조회수

Hi,

 

Thanks for mentioning the fix for the coarray fortran MLX over IB bug - I am currently trying to do this and tried both bug fixes you recommended but I still cannot get it working. I am using intel-oneapi-compilers/2022.0.2 and intel-oneapi-mpi/2021.4.0. 

UCX version 1.12.1 shows the following transports available:

# Transport: posix
# Transport: sysv
# Transport: self
# Transport: tcp
# Transport: tcp
# Transport: tcp
# Transport: rc_verbs
# Transport: rc_mlx5
# Transport: dc_mlx5
# Transport: ud_verbs
# Transport: ud_mlx5
# Transport: cma

However, when I set export I_MPI_OFI_PROVIDER=mlx I don't get anywhere. Do you know of any other fixes for using distributed coarrays over mlx?

Thanks!

0 포인트
Barbara_P_Intel
3,134 조회수

Can you please install the latest compilers that are part of oneAPI 2023.0 that was released in December 2023? Then compile and run again.

 

0 포인트
Barbara_P_Intel
3,728 조회수

GOOD NEWS!!  But please use ifort for now. It looks like you might have used ifx.

Be aware that ifx has limited co-array support; we sneaked it in there. With the next release co-array support is planned to be complete and official. See this article for information about the Fortran and OpenMP implementations in ifx available today.

0 포인트
calcaneo
초급자
3,721 조회수

Ooops. Thanks for the heads up the reality is that we are exploring still. But now ate least we know the compiler is working.

 

Thanks again!

 

0 포인트
응답