Re: "external32" Data Representation of MPI

Site · ‎05-09-2023

Intel MPI writes incorrect outputs for double and complex numbers when the data representation is set to "external32" . According to the MPI standard, 'All floating point values are in big-endian IEEE format of the appropriate size'. However, on a x86_64 machine, Intel MPI treats one 8 bytes double as two 4 bytes floats and swap their endianness respectively. Besides, the output of a complex number is some gibberish.

The following is an example "a.c" that outputs 1.0 + 2.0i as two doubles and as one double complex in "native" mode and "external32" mode respectively to the file "test.out":

#include <complex.h>
#include "mpi.h"

int main(int argc, char *argv[]) {
    int myrank;
    double buf[2] = {1.0, 2.0};
    double complex num = 1.0 + 2.0 * I;
    MPI_File fh;
    MPI_Status status;

    MPI_Init(&argc, &argv);
    MPI_Comm_rank(MPI_COMM_WORLD, &myrank);

    MPI_File_open(MPI_COMM_WORLD, "test.out", MPI_MODE_WRONLY | MPI_MODE_CREATE, MPI_INFO_NULL, &fh);

    if (myrank == 0) {
        /* write the two doubles in buf */
        /* little-endian */
        MPI_File_set_view(fh, 0, MPI_DOUBLE, MPI_DOUBLE, "native", MPI_INFO_NULL);
        MPI_File_write(fh, buf, 2, MPI_DOUBLE, &status);
        /* big-endian */
        MPI_File_set_view(fh, 2*8, MPI_DOUBLE, MPI_DOUBLE, "external32", MPI_INFO_NULL);
        MPI_File_write(fh, buf, 2, MPI_DOUBLE, &status);
        /* write the complex double num */
        /* little-endian */
        MPI_File_set_view(fh, 4*8, MPI_C_DOUBLE_COMPLEX, MPI_C_DOUBLE_COMPLEX, "native", MPI_INFO_NULL);
        MPI_File_write(fh, &num, 1, MPI_C_DOUBLE_COMPLEX, &status);
        /* big-endian */
        MPI_File_set_view(fh, 6*8, MPI_C_DOUBLE_COMPLEX, MPI_C_DOUBLE_COMPLEX, "external32", MPI_INFO_NULL);
        MPI_File_write(fh, &num, 1, MPI_C_DOUBLE_COMPLEX, &status);
    }

    MPI_File_close(&fh);

    return 0;
}

the output is

mpiicc a.c
mpirun -np 1 ./a.out
xxd test.out
0000000: 0000 0000 0000 f03f 0000 0000 0000 0040  .......?.......@ #native
0000010: 0000 0000 3ff0 0000 0000 0000 4000 0000  ....?.......@... #external32
0000020: 0000 0000 0000 f03f 0000 0000 0000 0040  .......?.......@ #native
0000030: 504d 495f 4644 0000 0000 0000 0000 0000  PMI_FD.......... #external32

an equivalent python program "a.py" is

import numpy as np

buf = np.array([1.0, 2.0])
num = np.array([1.0 + 2.0j])

with open('test_py.out', 'w') as f:
    buf.tofile(f)
    buf.astype('>f8').tofile(f)
    num.tofile(f)
    num.astype('>c16').tofile(f)

and the output is

python a.py
xxd test_py.out
00000000: 0000 0000 0000 f03f 0000 0000 0000 0040  .......?.......@
00000010: 3ff0 0000 0000 0000 4000 0000 0000 0000  ?.......@.......
00000020: 0000 0000 0000 f03f 0000 0000 0000 0040  .......?.......@
00000030: 3ff0 0000 0000 0000 4000 0000 0000 0000  ?.......@.......

I think the "external32" outputs of the c code should be the same as the python code outputs. Did I misunderstand the external32 MPI standard?

RabiyaSK_Intel · ‎05-10-2023

Hi,

Thanks for posting in Intel Communities.

Could you please provide the following details:

1. Intel oneAPI Toolkit version along with Intel MPI version

2. OS and hardware details

Thanks & Regards,

Shaik Rabiya

Site · ‎05-10-2023

OK. I tested on

System:    Kernel: 5.14.21-150400.24.18-default x86_64 bits: 64 Desktop: N/A Distro: openSUSE Leap 15.4
Machine:   Type: Server System: Sugon product: I620-G30 v: Purley serial: <superuser required>
           Mobo: Sugon model: 60P24-US v: 24001539 serial: <superuser required> UEFI-[Legacy]: American Megatrends v: 0JGST025
           date: 12/08/2017
CPU:       Info: 2x 10-Core model: Intel Xeon Silver 4114 bits: 64 type: MCP SMP cache: L2: 27.5 MiB
           Speed: 1586 MHz min/max: 800/3000 MHz Core speeds (MHz): 1: 1586 2: 943 3: 2503 4: 1301 5: 1303 6: 2500 7: 2497
           8: 2500 9: 1636 10: 2200 11: 1601 12: 1199 13: 1500 14: 1101 15: 2422 16: 2201 17: 2200 18: 2226 19: 1542 20: 2162

with Intel oneAPI Toolkit 2021.1 and Intel MPI 2021.1.1.

I also tested on

System:    Kernel: 3.10.0-1160.88.1.el7.x86_64 x86_64 bits: 64 Desktop: N/A
           Distro: CentOS Linux release 7.9.2009 (Core)
Machine:   Type: Server System: Dell product: PowerEdge R740 v: N/A serial: <superuser required>
           Mobo: Dell model: 06WXJT v: A01 serial: <superuser required> BIOS: Dell v: 2.8.2 date: 08/27/2020
CPU:       Info: 2x 8-Core model: Intel Xeon Silver 4208 bits: 64 type: MT MCP SMP cache: L2: 22 MiB
           Speed: 800 MHz min/max: 800/3200 MHz Core speeds (MHz): 1: 800 2: 800 3: 800 4: 800 5: 800 6: 897 7: 803 8: 800
           9: 800 10: 824 11: 801 12: 800 13: 800 14: 801 15: 926 16: 800 17: 800 18: 800 19: 806 20: 801 21: 800 22: 800
           23: 801 24: 800 25: 800 26: 804 27: 800 28: 800 29: 800 30: 800 31: 879 32: 800

with Intel oneAPI Toolkit 2022.0 and Intel MPI 2021.5.0.

RabiyaSK_Intel · ‎05-15-2023

Hi,

We are able to reproduce your issue. We've informed the concerned development team about it. We will get back to you soon.

Thanks & Regards,

Shaik Rabiya

Site · ‎05-15-2023

Thank you for your reply. I will keep an eye on this thread.

Rafael_L_Intel · ‎03-25-2024

Hello @Site ,

This bug has been fixed in Intel MPI 2021.12, which is going to be made publicly available March 28th (as a part of Intel HPC Toolkit 2024.1).

Cheers!

Rafael