Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
28446 Discussions

MPI netcdf parallel error write file

Christophe__Yohia
488 Views

Hello

I work with Intel Compiler version parallel_studio_xe_2019.3.062 

I  compile the version of NetCDF with parallel option. 
The version use

hdf5-1.10.5
Netcdf C 4-6-3 
Netcdf F90 4.4.5



The configuration of Netcdf:
--has-dap       -> yes
  --has-dap2      -> yes
  --has-dap4      -> yes
  --has-nc2       -> yes
  --has-nc4       -> yes
  --has-hdf5      -> yes
  --has-hdf4      -> no
  --has-logging   -> no
  --has-pnetcdf   -> no
  --has-szlib     -> no
  --has-cdf5      -> yes
  --has-parallel4 -> yes
  --has-parallel  -> yes



I run my code on HPC (800cores ,  60nodes) with SLURM for the job soumission. (1 node=24cores), OMNI-PATH network.

The storage disk is mounted with NFS on each node .

The option of NFS are nfs vers=3,soft,nolock,noacl 0 0

I check the permission on filesystem and user.


The error message indicate Permission Denied when I attempt to use nf90_create on several node (>=2).

For some test, I write a little program very simple to write Netcdf File with parallel option. (at the end of this message)

When I run the program on 1 node (24cores), the program create a file and write data.
When I run the program on 2 node or more (>=48),  a file Netcdf is created but the size is 0ko, then the program failed with ERROR 13 Permission denied.
If I keep this file, I run again the program without deleting the file, the program run with successfully.

 


The error is mentioned on function nf90_create.
nf90_create(filename ,IOR(NF90_NETCDF4,NF90_MPIIO),ncid,comm=MPI_COMM_WORLD, info=MPI_INFO_NULL)

I try several option of cmode (NF90_CLOBBER, NF90_NOCLOBBER, NF90_SHARE) .

It seems all processors don’t have a permission to write file at the same time or the file doesn't exist on some processus.

I compile the program with this option
mpiifort testwrite.f90 -I${PATH_NETCDF}/include/ -L${PATH_NETCDF}lib/ -lnetcdff -lnetcdf -lcurl -lhdf5_hl -lhdf5 -lz -o testwrite

With SLURM, I submit my job like this 
mpirun -bootstrap slurm -genv I_MPI_FABRICS shm:ofi -genv I_MPI_TMI_PROVIDER -psm2 -np $SLURM_NTASKS ${path}/testwrite
Or
mpirun -np $SLURM_NTASKS ${path}/testwrite


Thank you for helping or ideas to solve the problem
Christophe

 

My program like this
 

program testwrite 
USE netcdf 
USE mpi 

integer :: p, my_rank, ierr 
integer :: old_mode,ncid 
integer :: ntx_dim_id,nty_dim_id,nvar_dim_id 
integer :: ntx_id,nty_id,u_id 
integer,dimension(3) :: dims,starts 
integer, parameter :: NX = 6, NY = 12 
character(len=11) :: filename 
character(len=11) ::path 
integer, parameter :: NDIMS = 2 
integer :: varid, dimids(NDIMS) 
integer :: x_dimid, y_dimid, t_dimid 
integer :: start(NDIMS), count(NDIMS) 
integer :: data_out(NY,NX) 
logical :: file_exists  

do x = 1, NX      
do y = 1, NY         
  data_out(y, x) = (x - 1) * NY + (y - 1)      
 end do   
end do 

!Create file filename="savetest.nc" 

call MPI_Init(ierr) 
if (ierr /= 0) print *, 'Error in MPI_Init' 
call MPI_Comm_rank(MPI_COMM_WORLD, my_rank, ierr) 
if (ierr /= 0) 
print *, 'Error in MPI_Comm_size' 
call MPI_Comm_size(MPI_COMM_WORLD, p, ierr) 
if (ierr /= 0) print *, 'Error in MPI_Comm_size'      

status = nf90_create(filename ,IOR(NF90_NETCDF4,NF90_MPIIO),ncid,comm=MPI_COMM_WORLD, info=MPI_INFO_NULL)  status=nf90_def_dim(ncid, "x", NX, x_dimid)  
status=nf90_def_dim(ncid, "y", NY, y_dimid)    
 dimids = (/ y_dimid, x_dimid /)      
status=nf90_def_var(ncid, "data", NF90_INT, dimids, varid)   
status=nf90_put_var(ncid, varid, data_out)  
status=nf90_ENDDEF(ncid)  
status=nf90_close(ncid) 
 call MPI_Finalize(ierr) 

end

 

0 Kudos
0 Replies
Reply