Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

OpenMP related I/O error

thornpig
Beginner
1,357 Views
I am running a numerical model in "dm+sm" parallel mode on a cluster with 64 bit linux OS which has more than 2 nodes and 16 cpus per node. The compilation of the model with "dm+sm" seemed successful.


When I set OMP_NUM_THREADS to 2 or greater , the model was terminated with an error message :
forrtl: severe (40): recursive I/O operation, unit 0, file unknown
When I set OMP_NUM_THREADS to 1, which acctually disabled the openmp function, the model ran successfully.
Please help me out. Thanks a lot !
Below is my PBS job file:
#!/bin/csh
#PBS -l nodes=2:ppn=8
#PBS -m ae
setenv OMP_NUM_THREADS 2
time mpirun wrf.csh
where "Wrf.csh" unlimits the stacksize and executes the model as follows:
#!/bin/csh
limit stacksize unlimited
exec wrf.exe
0 Kudos
8 Replies
jimdempseyatthecove
Honored Contributor III
1,357 Views
Is your file I/O context held in Thread Local Storage?

Locate in your code where (you suspect) the erronious I/O statement is located. Insert some diagnostic code

!$OMP CRITICAL
WRITE(*,*) 'Debug IO ', omp_get_thread_num()
bSomeError = .false.
(your I/O statement(s) here)
goto 12345
(your I/O error code here)
bSomeError = .true.
12345 continue
!$OMP END CRITICAL
if(bSomeError) goto (your error lable here)

Jim
0 Kudos
Martyn_C_Intel
Employee
1,357 Views
The "recursion" in thisparticular error message sometimes means that an error occurred during a write to stdout/stderr, and so when the RTL tries to write a diagnostic to the same unit, it is considerd as a recursive use of that unit. But it might indicate an error or thread safety issue on some other unit.

Along the same lines as Jim's suggestion, you should make sure that you are linking to the threadsafe version of the Fortran RTL. This should happen automatically if you use the compiler driver to link and have -openmp on the link line. But if you link to the OpenMP library explicitly, then you may need to put -threads on the link line, or link to libifcoremt instead of libifcore.
0 Kudos
thornpig
Beginner
1,357 Views
Thank you Jim and Martyn!
I am trying to figure out your suggestions and will give you feedback later.
0 Kudos
Yongjia_S_
Beginner
1,357 Views
I gotsimilarproblem to run WRF model compiled by intel compiler with openmp. wrf.exe can run with omp_num_threads=1, but get same error when omp_num_threads>1. However, when I use gfortran/gcc to compile this model with openmp, it can run successfully when omp_num_threads>1. any clue? thanks.
0 Kudos
jimdempseyatthecove
Honored Contributor III
1,357 Views
One other thing to look at is:

Does your I/O statement(s) within the parallel region(s) attept to transfer outside the parallel region (on error, eof, ...)?

It is not valid to do so. (place these branch targets inside the parallel region)
The compiler should warn of this coding error.

Jim Dempsey

0 Kudos
Yongjia_S_
Beginner
1,357 Views
I don't undersand the mean in Jim's code. what is "(your I/O statement(s) here)" and "(your I/O error code here)", are they same io sentence I suspect in my program? what is "(your error lable here)"? thanks.
0 Kudos
jimdempseyatthecove
Honored Contributor III
1,357 Views
I don't undersand the mean in Jim's code. what is "(your I/O statement(s) here)" and "(your I/O error code here)", are they same io sentence I suspect in my program? what is "(your error lable here)"? thanks.


!$OMP PARALLEL
...
READ(YourInUnit, 100, err=999, end=888) arglist
100 FORMAT(...)
...
GOTO 12345
! error label
! *** MUST reside within sameparallel region of READ(..., err=label)
999ErrorFlag = .true.
GOTO 12345 ! to label at end of parallel region
! *** MUST reside within sameparallel region of READ(..., end=label)
888 EndFlag = .true.
GOTO 12345
...
12345 CONTINUE
!$OMP END PARALLEL
if(ErrorFlag) GOTO 9999
if(EndFlag) GOTO 8888
...

The labels 999 and 888 are dispatched to froma READ statementwithin the parallel region and therefore must reside within the same parallel region. Following the exit of the parallel region, test for end and/or error conditions.

"A structured block of code is a collection of one or more executable statements with a single point of entry at the top and a single point of exit at the bottom."

I hope this clears up the issue.

Jim Dempsey

0 Kudos
Yongjia_S_
Beginner
1,357 Views
Thanks, Jim, I understand now.
0 Kudos
Reply