Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
Announcements
Welcome to the Intel Community. If you get an answer you like, please mark it as an Accepted Solution to help others. Thank you!

MPI netcdf parallel error write file

Christophe__Yohia
97 Views

Hello

I work with Intel Compiler version parallel_studio_xe_2019.3.062 

I  compile the version of NetCDF with parallel option. 
The version use

hdf5-1.10.5
Netcdf C 4-6-3 
Netcdf F90 4.4.5



The configuration of Netcdf:
--has-dap       -> yes
  --has-dap2      -> yes
  --has-dap4      -> yes
  --has-nc2       -> yes
  --has-nc4       -> yes
  --has-hdf5      -> yes
  --has-hdf4      -> no
  --has-logging   -> no
  --has-pnetcdf   -> no
  --has-szlib     -> no
  --has-cdf5      -> yes
  --has-parallel4 -> yes
  --has-parallel  -> yes



I run my code on HPC (800cores ,  60nodes) with SLURM for the job soumission. (1 node=24cores), OMNI-PATH network.

The storage disk is mounted with NFS on each node .

The option of NFS are nfs vers=3,soft,nolock,noacl 0 0

I check the permission on filesystem and user.


The error message indicate Permission Denied when I attempt to use nf90_create on several node (>=2).

For some test, I write a little program very simple to write Netcdf File with parallel option. (at the end of this message)

When I run the program on 1 node (24cores), the program create a file and write data.
When I run the program on 2 node or more (>=48),  a file Netcdf is created but the size is 0ko, then the program failed with ERROR 13 Permission denied.
If I keep this file, I run again the program without deleting the file, the program run with successfully.

 


The error is mentioned on function nf90_create.
nf90_create(filename ,IOR(NF90_NETCDF4,NF90_MPIIO),ncid,comm=MPI_COMM_WORLD, info=MPI_INFO_NULL)

I try several option of cmode (NF90_CLOBBER, NF90_NOCLOBBER, NF90_SHARE) .

It seems all processors don’t have a permission to write file at the same time or the file doesn't exist on some processus.

I compile the program with this option
mpiifort testwrite.f90 -I${PATH_NETCDF}/include/ -L${PATH_NETCDF}lib/ -lnetcdff -lnetcdf -lcurl -lhdf5_hl -lhdf5 -lz -o testwrite

With SLURM, I submit my job like this 
mpirun -bootstrap slurm -genv I_MPI_FABRICS shm:ofi -genv I_MPI_TMI_PROVIDER -psm2 -np $SLURM_NTASKS ${path}/testwrite
Or
mpirun -np $SLURM_NTASKS ${path}/testwrite


Thank you for helping or ideas to solve the problem
Christophe

 

My program like this
 

program testwrite 
USE netcdf 
USE mpi 

integer :: p, my_rank, ierr 
integer :: old_mode,ncid 
integer :: ntx_dim_id,nty_dim_id,nvar_dim_id 
integer :: ntx_id,nty_id,u_id 
integer,dimension(3) :: dims,starts 
integer, parameter :: NX = 6, NY = 12 
character(len=11) :: filename 
character(len=11) ::path 
integer, parameter :: NDIMS = 2 
integer :: varid, dimids(NDIMS) 
integer :: x_dimid, y_dimid, t_dimid 
integer :: start(NDIMS), count(NDIMS) 
integer :: data_out(NY,NX) 
logical :: file_exists  

do x = 1, NX      
do y = 1, NY         
  data_out(y, x) = (x - 1) * NY + (y - 1)      
 end do   
end do 

!Create file filename="savetest.nc" 

call MPI_Init(ierr) 
if (ierr /= 0) print *, 'Error in MPI_Init' 
call MPI_Comm_rank(MPI_COMM_WORLD, my_rank, ierr) 
if (ierr /= 0) 
print *, 'Error in MPI_Comm_size' 
call MPI_Comm_size(MPI_COMM_WORLD, p, ierr) 
if (ierr /= 0) print *, 'Error in MPI_Comm_size'      

status = nf90_create(filename ,IOR(NF90_NETCDF4,NF90_MPIIO),ncid,comm=MPI_COMM_WORLD, info=MPI_INFO_NULL)  status=nf90_def_dim(ncid, "x", NX, x_dimid)  
status=nf90_def_dim(ncid, "y", NY, y_dimid)    
 dimids = (/ y_dimid, x_dimid /)      
status=nf90_def_var(ncid, "data", NF90_INT, dimids, varid)   
status=nf90_put_var(ncid, varid, data_out)  
status=nf90_ENDDEF(ncid)  
status=nf90_close(ncid) 
 call MPI_Finalize(ierr) 

end

 

0 Kudos
0 Replies
Reply