Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
Announcements
Welcome to the Intel Community. If you get an answer you like, please mark it as an Accepted Solution to help others. Thank you!

MPI netcdf parallel error write file

Christophe__Yohia
94 Views
Hello

I use Intel Compiler version parallel_studio_xe_2019.3.062 

I  compile the version of NetCDF with parallel option. 
The version use

hdf5-1.10.5
Netcdf C 4-6-3 
Netcdf F90 4.4.5


The configuration of Netcdf:
--has-dap       -> yes
  --has-dap2      -> yes
  --has-dap4      -> yes
  --has-nc2       -> yes
  --has-nc4       -> yes
  --has-hdf5      -> yes
  --has-hdf4      -> no
  --has-logging   -> no
  --has-pnetcdf   -> no
  --has-szlib     -> no
  --has-cdf5      -> yes
  --has-parallel4 -> yes
  --has-parallel  -> yes



I run my code on HPC (800cores ,  60nodes) with SLURM for the job soumission. (1 node=24cores).
The storage disk is mounted on each node with NFS 
The option NFS nfs vers=3,soft,nolock,noacl 0 0
I check the permission on filesystem and user.

The error message indicate Permission Denied when I attempt to use nf90_create on several node (>=2).


For some test, I write a little program very simple to write Netcdf File with parallel option (at the end of message).

When I run the program on 1 node (24cores), the program create a file and write data.
When I run the program on 2 node or more (>=48),  a file Netcdf is created but the size is 0ko, then the program failed with ERROR 13 Permission denied.
If I keep this file, I run again the program without deleting the file, the program run with successfully.

The error is mentioned on function nf90_create.
nf90_create(filename ,IOR(NF90_NETCDF4,NF90_MPIIO),ncid,comm=MPI_COMM_WORLD, info=MPI_INFO_NULL)

It seems all processors don’t have a permission to write file at the same time .

I compile the program with this option
mpiifort testwrite.f90 -I${PATH_NETCDF}/include/ -L${PATH_NETCDF}lib/ -lnetcdff -lnetcdf -lcurl -lhdf5_hl -lhdf5 -lz -o testwrite

With SLURM, I submit my job like this 
mpirun -bootstrap slurm -genv I_MPI_FABRICS shm:ofi -genv I_MPI_TMI_PROVIDER -psm2 -np $SLURM_NTASKS ${path}/testwrite
Or
mpirun -np $SLURM_NTASKS ${path}/testwrite



Thank you for helping or ideas to solve the problem
Christophe
program testwrite
USE netcdf
USE mpi


integer :: p, my_rank, ierr
integer :: old_mode,ncid
integer :: ntx_dim_id,nty_dim_id,nvar_dim_id
integer :: ntx_id,nty_id,u_id
integer,dimension(3) :: dims,starts
integer, parameter :: NX = 6, NY = 12
character(len=11) :: filename
character(len=11) ::path

integer, parameter :: NDIMS = 2
integer :: varid, dimids(NDIMS)
integer :: x_dimid, y_dimid, t_dimid
integer :: start(NDIMS), count(NDIMS)
integer :: data_out(NY,NX)
logical :: file_exists


 do x = 1, NX
     do y = 1, NY
        data_out(y, x) = (x - 1) * NY + (y - 1)
     end do
  end do


!Create file
filename="savetest.nc"
call MPI_Init(ierr)
if (ierr /= 0) print *, 'Error in MPI_Init'

call MPI_Comm_rank(MPI_COMM_WORLD, my_rank, ierr)
if (ierr /= 0) print *, 'Error in MPI_Comm_size'

call MPI_Comm_size(MPI_COMM_WORLD, p, ierr)
if (ierr /= 0) print *, 'Error in MPI_Comm_size'
 
 
 status = nf90_create(filename ,IOR(NF90_NETCDF4,NF90_MPIIO),ncid,comm=MPI_COMM_WORLD, info=MPI_INFO_NULL)

  status=nf90_def_dim(ncid, "x", NX, x_dimid)
  
  status=nf90_def_dim(ncid, "y", NY, y_dimid)
  
  dimids = (/ y_dimid, x_dimid /)

  status=nf90_def_var(ncid, "data", NF90_INT, dimids, varid)
 
  status=nf90_put_var(ncid, varid, data_out)

 status=nf90_ENDDEF(ncid)
 status=nf90_close(ncid)
 call MPI_Finalize(ierr)



end

 

0 Kudos
0 Replies
Reply