Intel® oneAPI HPC Toolkit
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
1939 Discussions

Inconsistent file content between runs with MPI_File_write

Mach
Beginner
704 Views

 

 

#include <cmath>
#include <iostream>
#include <string>

#include "mpi.h"

template <typename DT>
inline DT **createC(int m, int n) {
  DT **array = new DT *[m];
  *array = new DT[m * n];
  for (int i = 1; i < m; ++i) {
    array[i] = array[i - 1] + n;
  }
  return array;
}

template <typename DT>
inline void freeC(DT **array) {
  if (array) {
    if (array[0]) delete[] array[0];
    delete[] array;
  }
}

int main(int argc, char **argv) {
  int rank, size;
  MPI_File fhw;
  MPI_Status status;
  MPI_Offset offset;
  MPI_Init(&argc, &argv);
  MPI_Comm_rank(MPI_COMM_WORLD, &rank);
  MPI_Comm_size(MPI_COMM_WORLD, &size);

  const int nvar = 3;
  const int nx = rank + 5;
  double **buf = createC<double>(nx, nvar);
  for (int i = 0; i < nx; ++i) {
    for (int ivar = 0; ivar < nvar; ++ivar) {
      buf[i][ivar] = 100 * rank + 1.0 * i + 0.1 * ivar;
    }
  }

  // disp of buf[0][0]
  int proc_disp = 0;
  if (rank != 0)
    MPI_Recv(&proc_disp, 1, MPI_INT, rank - 1, 0, MPI_COMM_WORLD,
             MPI_STATUS_IGNORE);
  int next_disp = proc_disp + nx * sizeof(double);
  if (rank < size - 1)
    MPI_Send(&next_disp, 1, MPI_INT, rank + 1, 0, MPI_COMM_WORLD);

  int nx_total;
  MPI_Allreduce(&nx, &nx_total, 1, MPI_INT, MPI_SUM, MPI_COMM_WORLD);

  MPI_Aint sizeofdouble, lb;
  MPI_Type_get_extent(MPI_DOUBLE, &lb, &sizeofdouble);

  MPI_Datatype col, col_memory;
  MPI_Type_vector(nx, 1, nvar, MPI_DOUBLE, &col);
  MPI_Type_create_resized(col, lb, sizeofdouble, &col_memory);
  MPI_Type_commit(&col_memory);

  MPI_Datatype col_file, ftype;
  MPI_Type_vector(1, nx, nx_total, MPI_DOUBLE, &col_file);
  MPI_Type_create_resized(col_file, 0, nx_total * sizeofdouble, &ftype);
  MPI_Type_commit(&ftype);
  
  std::string file = "test2.bin";
  MPI_File_open(MPI_COMM_WORLD, (char *)file.c_str(),MPI_MODE_CREATE	|	MPI_MODE_WRONLY, MPI_INFO_NULL, &fhw);
  MPI_Offset f_offset = proc_disp;
  MPI_File_set_view(fhw, f_offset, MPI_DOUBLE, ftype, "native", MPI_INFO_NULL);
  MPI_File_write(fhw, &buf[0][0], nvar, col_memory, MPI_STATUS_IGNORE);
  
  MPI_File_close(&fhw);
  MPI_Type_free(&col_memory);
  MPI_Type_free(&ftype);

  freeC(buf);
  MPI_Finalize();

  return 0;
}

 

 

 

The aim of the program is to transpose matrix buf and write it to the file. The final matrix in the file should have a shape of nvar * nx_total. I'm using mpiicpc with icpc (ICC) 2021.4.0 20210910,

 

If I run the program with mpiexec -np 4 ./test.x,  an objdump of the file with od -t f8 -w80 test2.bin sometime shows the following results ( the result looks different for every run):

0000000 0 1 2 3 4 0 0 0 0 0
0000120 0        200 201 202 203 204 205 206 300 301
0000240 302   303 304 305 306 307 -nan -nan -nan -nan
0000360 -nan -nan -nan -nan -nan -nan -nan 200.1 201.1 202.1
0000500 203.1 204.1 205.1 206.1 300.1 301.1 302.1 303.1 304.1 305.1
0000620 306.1 307.1 -nan -nan -nan -nan -nan -nan -nan -nan
0000740 -nan -nan -nan 200.2 201.2 202.2 203.2 204.2 205.2 206.2
0001060 300.2 301.2 302.2 303.2 304.2 305.2 306.2 307.2

 

I expect the results should be like the following

 

0000000 0 1 2 3 4 100 101 102 103 104
0000120 105 200 201 202 203 204 205 206 300 301
0000240 302 303 304 305 306 307 0.1 1.1 2.1 3.1
0000360 4.1 100.1 101.1 102.1 103.1 104.1 105.1 200.1 201.1 202.1
0000500 203.1 204.1 205.1 206.1 300.1 301.1 302.1 303.1 304.1 305.1
0000620 306.1 307.1 0.2 1.2 2.2 3.2 4.2 100.2 101.2 102.2
0000740 103.2 104.2 105.2 200.2 201.2 202.2 203.2 204.2 205.2 206.2
0001060 300.2 301.2 302.2 303.2 304.2 305.2 306.2 307.2

 

If I replace MPI_File_write by MPI_File_write_all, I can get desired results. OpenMPI doesn't reproduce the problem.

 

Labels (2)
0 Kudos
5 Replies
SantoshY_Intel
Moderator
679 Views

Hi,


Thanks for reaching out to us.


We tried running your code and it is working fine at our end.


So, could you please provide the following details to investigate more on your issue?

  1. The version of OS being used.
  2. The number of nodes being used to launch the MPI job (whether using a multi-node/ single-node environment).
  3. The hardware you are using(ex: Infiniband/ Omnipath).
  4. The FI_PROVIDER(ex: mlx/psm2) you are using.


Best Regards,

Santosh




Mach
Beginner
672 Views

My development environment where I encountered the problem:

  1. Windows 10 Enterprise 10.0.18363 Build 18363
    windows subsystem for linux 1
    Distributor ID: Ubuntu
    Description: Ubuntu 20.04.3 LTS
    Release: 20.04
    Codename: focal
  2. Program is compiled as mpiicpc main.cpp -o test,  run as mpiexec -np 4 ./test
    which mpiicpc
    -> /opt/intel/oneapi/mpi/latest//bin/mpiicpc
    mpiicpc -v
    mpiicpc for the Intel(R) MPI Library 2021.4 for Linux*
    Copyright Intel Corporation.
    icpc version 2021.4.0 (gcc version 9.3.0 compatibility)
    mpiicpc -show
    icpc -I"/opt/intel/oneapi/mpi/latest/include" -L"/opt/intel/oneapi/mpi/latest/lib/release" -L"/opt/intel/oneapi/mpi/latest/lib" -Xlinker --enable-new-dtags -Xlinker -rpath -Xlinker "/opt/intel/oneapi/mpi/latest/lib/release" -Xlinker -rpath -Xlinker "/opt/intel/oneapi/mpi/latest/lib" -lmpicxx -lmpifort -lmpi -ldl -lrt -lpthread
  3. This is a laptop with Core i7-10850H 6 cores 12 processors. I installed Intel OneAPI following APT (intel.com). I didn't do a special configuration. I don't know how to check the hardware and FI_PROVIDER

My production environment (a HPC) has a different intel compiler but works fine

  • intel-ps-2018u4, icpc (ICC) 18.0.5 20180823, gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-17)
SantoshY_Intel
Moderator
588 Views

Hi,

 

Thanks for providing the details.

 

We are able to get the correct results as shown in the below screenshot with standalone ubuntu 20.04 machine using intel oneAPI 2021.4.

SantoshY_Intel_0-1634202396327.png

 

>>"My production environment (a HPC) has a different intel compiler but works fine"

Could you please try and let us know whether you are facing the same issue in your HPC environment?

 

Thanks & Regards,

Santosh

 

SantoshY_Intel
Moderator
515 Views

Hi,


We haven't heard back from you. Could you please try and let us know whether you are facing the same issue in your HPC environment?


Thanks & regards,

santosh


SantoshY_Intel
Moderator
427 Views

Hi,


As we have not heard back from you, we are closing this thread and will no longer be monitored by Intel. If you need further assistance, please post a new question.


Thanks & regards,

Santosh


Reply