- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I have installed intel2020u0 on RHEL 7.6 based system having intel 8280M processor.
While running a quick test with linpack binary provided under compilers_and_libraries_2020.0.166/linux/mkl/benchmarks/mp_linpack , i end up with issues. Here is how i setup and run the linpack binary (on single node) -
[user@node1 BASELINE]$ cp /home/user/COMPILER/MPI/INTELMPI/2020u0/compilers_and_libraries_2020.0.166/linux/mkl/benchmarks/mp_linpack/xhpl_intel64_static /home/user/COMPILER/MPI/INTELMPI/2020u0/compilers_and_libraries_2020.0.166/linux/mkl/benchmarks/mp_linpack/runme_intel64_static /home/user/COMPILER/MPI/INTELMPI/2020u0/compilers_and_libraries_2020.0.166/linux/mkl/benchmarks/mp_linpack/runme_intel64_prv /home/user/COMPILER/MPI/INTELMPI/2020u0/compilers_and_libraries_2020.0.166/linux/mkl/benchmarks/mp_linpack/HPL.dat . [puneet@node61 BASELINE]$ mpirun --version Intel(R) MPI Library for Linux* OS, Version 2019 Update 6 Build 20191024 (id: 082ae5608) Copyright 2003-2019, Intel Corporation. [user@node1 BASELINE]$ ls HPL.dat runme_intel64_prv runme_intel64_static xhpl_intel64_static [user@node1 BASELINE]$ ./runme_intel64_static This is a SAMPLE run script. Change it to reflect the correct number of CPUs/threads, number of nodes, MPI processes per node, etc.. This run was done on: Wed Apr 8 22:36:04 IST 2020 RANK=1, NODE=1 RANK=0, NODE=0 Abort(1094543) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack: MPIR_Init_thread(649)......: MPID_Init(861).............: MPIDI_NM_mpi_init_hook(953): OFI fi_open domain failed (ofi_init.h:953:MPIDI_NM_mpi_init_hook:No data available) Abort(1094543) on node 1 (rank 1 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack: MPIR_Init_thread(649)......: MPID_Init(861).............: MPIDI_NM_mpi_init_hook(953): OFI fi_open domain failed (ofi_init.h:953:MPIDI_NM_mpi_init_hook:No data available) Done: Wed Apr 8 22:36:05 IST 2020 [user@node1 BASELINE]$
now, in the 2020u0 environment, if i remove the xhpl_intel64_static binary and use the one supplied with 2019u5 (HPL 2.3), HPL works fine -
[user@node1 BASELINE]$ cp /home/user/COMPILER/MPI/INTELMPI/2019_U5/compilers_and_libraries_2019.5.281/linux/mkl/benchmarks/mp_linpack/xhpl_intel64_static . [user@node1 BASELINE]$ ./runme_intel64_static This is a SAMPLE run script. Change it to reflect the correct number of CPUs/threads, number of nodes, MPI processes per node, etc.. This run was done on: Wed Apr 8 22:36:40 IST 2020 RANK=0, NODE=0 RANK=1, NODE=1 ================================================================================ HPLinpack 2.3 -- High-Performance Linpack benchmark -- December 2, 2018 Written by A. Petitet and R. Clint Whaley, Innovative Computing Laboratory, UTK Modified by Piotr Luszczek, Innovative Computing Laboratory, UTK Modified by Julien Langou, University of Colorado Denver ================================================================================ An explanation of the input/output parameters follows: T/V : Wall time / encoded variant. N : The order of the coefficient matrix A. NB : The partitioning blocking factor. P : The number of process rows. Q : The number of process columns. Time : Time in seconds to solve the linear system. Gflops : Rate of execution for solving the linear system. The following parameter values will be used: N : 1000 NB : 192 PMAP : Column-major process mapping P : 1 Q : 1 PFACT : Right NBMIN : 2 NDIV : 2 RFACT : Crout BCAST : 1ring DEPTH : 0 SWAP : Binary-exchange L1 : no-transposed form U : no-transposed form EQUIL : no ALIGN : 8 double precision words -------------------------------------------------------------------------------- - The matrix A is randomly generated for each test. - The following scaled residual check will be computed: ||Ax-b||_oo / ( eps * ( || x ||_oo * || A ||_oo + || b ||_oo ) * N ) - The relative machine precision (eps) is taken to be 1.110223e-16 - Computational tests pass if scaled residuals are less than 16.0 node1 : Column=000192 Fraction=0.005 Kernel= 0.00 Mflops=100316.35 node1 : Column=000384 Fraction=0.195 Kernel=65085.04 Mflops=83075.67 node1 : Column=000576 Fraction=0.385 Kernel=39885.67 Mflops=70127.11 node1 : Column=000768 Fraction=0.595 Kernel=17659.92 Mflops=58843.41 node1 : Column=000960 Fraction=0.795 Kernel= 4894.70 Mflops=51756.17 ================================================================================ T/V N NB P Q Time Gflops -------------------------------------------------------------------------------- WC00C2R2 1000 192 1 1 0.01 4.64944e+01 HPL_pdgesv() start time Wed Apr 8 22:36:41 2020 HPL_pdgesv() end time Wed Apr 8 22:36:41 2020 -------------------------------------------------------------------------------- ||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0059446 ...... PASSED ================================================================================ Finished 1 tests with the following results: 1 tests completed and passed residual checks, 0 tests completed and failed residual checks, 0 tests skipped because of illegal input values. -------------------------------------------------------------------------------- End of Tests. ================================================================================ Done: Wed Apr 8 22:36:41 IST 2020
same is the case with the xhpl binary supplied with intel 2018u4 (HPLv2.1)
[user@node1 BASELINE]$ cp /home/user/COMPILER/MPI/INTELMPI/2018_U4/compilers_and_libraries_2018.5.274/linux/mkl/benchmarks/mp_linpack/xhpl_intel64_static . [user@node1 BASELINE]$ ./runme_intel64_static This is a SAMPLE run script. Change it to reflect the correct number of CPUs/threads, number of nodes, MPI processes per node, etc.. This run was done on: Wed Apr 8 22:37:48 IST 2020 RANK=0, NODE=0 RANK=1, NODE=1 ================================================================================ HPLinpack 2.1 -- High-Performance Linpack benchmark -- October 26, 2012 Written by A. Petitet and R. Clint Whaley, Innovative Computing Laboratory, UTK Modified by Piotr Luszczek, Innovative Computing Laboratory, UTK Modified by Julien Langou, University of Colorado Denver ================================================================================ An explanation of the input/output parameters follows: T/V : Wall time / encoded variant. N : The order of the coefficient matrix A. NB : The partitioning blocking factor. P : The number of process rows. Q : The number of process columns. Time : Time in seconds to solve the linear system. Gflops : Rate of execution for solving the linear system. The following parameter values will be used: N : 1000 NB : 192 PMAP : Column-major process mapping P : 1 Q : 1 PFACT : Right NBMIN : 2 NDIV : 2 RFACT : Crout BCAST : 1ring DEPTH : 0 SWAP : Binary-exchange L1 : no-transposed form U : no-transposed form EQUIL : no ALIGN : 8 double precision words -------------------------------------------------------------------------------- - The matrix A is randomly generated for each test. - The following scaled residual check will be computed: ||Ax-b||_oo / ( eps * ( || x ||_oo * || A ||_oo + || b ||_oo ) * N ) - The relative machine precision (eps) is taken to be 1.110223e-16 - Computational tests pass if scaled residuals are less than 16.0 node1 : Column=000192 Fraction=0.005 Kernel= 0.00 Mflops=99748.31 node1 : Column=000384 Fraction=0.195 Kernel=67904.30 Mflops=84547.57 node1 : Column=000576 Fraction=0.385 Kernel=39287.97 Mflops=70666.21 node1 : Column=000768 Fraction=0.595 Kernel=18197.26 Mflops=59578.53 node1 : Column=000960 Fraction=0.795 Kernel= 4634.78 Mflops=51930.16 ================================================================================ T/V N NB P Q Time Gflops -------------------------------------------------------------------------------- WC00C2R2 1000 192 1 1 0.01 4.96887e+01 HPL_pdgesv() start time Wed Apr 8 22:37:49 2020 HPL_pdgesv() end time Wed Apr 8 22:37:49 2020 -------------------------------------------------------------------------------- ||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0059446 ...... PASSED ================================================================================ Finished 1 tests with the following results: 1 tests completed and passed residual checks, 0 tests completed and failed residual checks, 0 tests skipped because of illegal input values. -------------------------------------------------------------------------------- End of Tests. ================================================================================ Done: Wed Apr 8 22:37:49 IST 2020
here is the fi_info putput -
[user@node1 BASELINE]$ fi_info provider: mlx fabric: mlx domain: mlx version: 1.5 type: FI_EP_UNSPEC protocol: FI_PROTO_MLX provider: mlx;ofi_rxm fabric: mlx domain: mlx version: 1.0 type: FI_EP_RDM protocol: FI_PROTO_RXM
also i tested the mpi hello word -
[user@node1 BASELINE]$ mpiicc hello.c [user@node1 BASELINE]$ mpirun -np 2 ./a.out Hello world from processor node61, rank 0 out of 2 processors Hello world from processor node61, rank 1 out of 2 processors Please advice.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The product fix is part of MKL 2020 update 2.
This issue has been resolved and we will no longer respond to this thread. If you require additional assistance from Intel, please start a new thread. Any further interaction in this thread will be considered community only
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Puneet,
We have tried and observed the same. The benchmark fails to run when the provider is set to mlx while works if the provider is tcp/verbs.
Thanks for reporting it to us.
We will be forwarding this to the respective team.
Thanks
Prasanth
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Could you please share the settings/env variables to make it work?
i need to run the hpl only on single node (so fabrics doesn't matter to me for now).
here is what i get when i use tcp -
[user@node1 test]$ export I_MPI_FABRICS=tcp [user@node1 test]$ ./runme_intel64_static This is a SAMPLE run script. Change it to reflect the correct number of CPUs/threads, number of nodes, MPI processes per node, etc.. This run was done on: Thu Apr 9 16:40:03 IST 2020 RANK=1, NODE=1 RANK=0, NODE=0 MPI startup(): tcp fabric is unknown or has been removed from the product, please use ofi or shm:ofi instead Abort(1094543) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack: MPIR_Init_thread(649)......: MPID_Init(861).............: MPIDI_NM_mpi_init_hook(953): OFI fi_open domain failed (ofi_init.h:953:MPIDI_NM_mpi_init_hook:No data available) Done: Thu Apr 9 16:40:03 IST 2020
UPDATE: with export FI_PROVIDER=tcp , i am able to run HPL
user@node1 test]$ export FI_PROVIDER=tcp user@node1 test]$ ./runme_intel64_static This is a SAMPLE run script. Change it to reflect the correct number of CPUs/threads, number of nodes, MPI processes per node, etc.. This run was done on: Thu Apr 9 16:48:44 IST 2020 RANK=1, NODE=1 RANK=0, NODE=0 [0] MPI startup(): I_MPI_DAPL_DIRECT_COPY_THRESHOLD variable has been removed from the product, its value is ignored [0] MPI startup(): I_MPI_DAPL_DIRECT_COPY_THRESHOLD environment variable is not supported. [0] MPI startup(): Similar variables: I_MPI_SHM_SEND_TINY_MEMCPY_THRESHOLD [0] MPI startup(): To check the list of supported variables, use the impi_info utility or refer to https://software.intel.com/en-us/mpi-library/documentation/get-started. ================================================================================ HPLinpack 2.3 -- High-Performance Linpack benchmark -- December 2, 2018 Written by A. Petitet and R. Clint Whaley, Innovative Computing Laboratory, UTK Modified by Piotr Luszczek, Innovative Computing Laboratory, UTK Modified by Julien Langou, University of Colorado Denver ================================================================================ An explanation of the input/output parameters follows: T/V : Wall time / encoded variant. N : The order of the coefficient matrix A. NB : The partitioning blocking factor. P : The number of process rows. Q : The number of process columns. Time : Time in seconds to solve the linear system. Gflops : Rate of execution for solving the linear system. The following parameter values will be used: N : 1000 NB : 192 PMAP : Column-major process mapping P : 1 Q : 1 PFACT : Right NBMIN : 2 NDIV : 2 RFACT : Crout BCAST : 1ring DEPTH : 0 SWAP : Binary-exchange L1 : no-transposed form U : no-transposed form EQUIL : no ALIGN : 8 double precision words -------------------------------------------------------------------------------- - The matrix A is randomly generated for each test. - The following scaled residual check will be computed: ||Ax-b||_oo / ( eps * ( || x ||_oo * || A ||_oo + || b ||_oo ) * N ) - The relative machine precision (eps) is taken to be 1.110223e-16 - Computational tests pass if scaled residuals are less than 16.0 node1 : Column=000192 Fraction=0.005 Kernel= 0.00 Mflops=102934.66 node1 : Column=000384 Fraction=0.195 Kernel=67954.85 Mflops=85968.97 node1 : Column=000576 Fraction=0.385 Kernel=40114.53 Mflops=71945.58 node1 : Column=000768 Fraction=0.595 Kernel=19454.64 Mflops=61274.77 node1 : Column=000960 Fraction=0.795 Kernel= 5232.37 Mflops=54078.56 ================================================================================ T/V N NB P Q Time Gflops -------------------------------------------------------------------------------- WC00C2R2 1000 192 1 1 0.01 4.86455e+01 HPL_pdgesv() start time Thu Apr 9 16:48:45 2020 HPL_pdgesv() end time Thu Apr 9 16:48:45 2020 -------------------------------------------------------------------------------- ||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 5.94455135e-03 ...... PASSED ================================================================================ Finished 1 tests with the following results: 1 tests completed and passed residual checks, 0 tests completed and failed residual checks, 0 tests skipped because of illegal input values. -------------------------------------------------------------------------------- End of Tests. ================================================================================ Done: Thu Apr 9 16:48:45 IST 2020
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
The workaround in your case would be either to use Ethernet as you already mentioned or leverage the InfiniBand* fabric via verbs (I_MPI_OFI_PROVIDER=verbs). Since you are running along a single node only, you may also use the shared memory transport layer from Intel MPI (I_MPI_FABRICS=shm:ofi).
However, the actual issue is that the new (default) mlx provider does not work for you. Therefore please refer to the requirements and limitations of the mlx provider using the following link.: https://software.intel.com/en-us/articles/improve-performance-and-stability-with-intel-mpi-library-on-infiniband
Please let us know if the issue is resolved.
Best regards,
Michael
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
here's what i get on using shm:ofi -
[user@node1 test]$ export I_MPI_FABRICS=shm:ofi [user@node1 test]$ ./runme_intel64_static This is a SAMPLE run script. Change it to reflect the correct number of CPUs/threads, number of nodes, MPI processes per node, etc.. This run was done on: Fri Apr 10 09:34:18 IST 2020 RANK=0, NODE=0 RANK=1, NODE=1 Abort(1094543) on node 1 (rank 1 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack: MPIR_Init_thread(649)......: MPID_Init(861).............: MPIDI_NM_mpi_init_hook(953): OFI fi_open domain failed (ofi_init.h:953:MPIDI_NM_mpi_init_hook:No data available) Done: Fri Apr 10 09:34:18 IST 2020
and with shm-
[user@node1 test]$ export I_MPI_FABRICS=shm [user@node1 test]$ ./runme_intel64_static This is a SAMPLE run script. Change it to reflect the correct number of CPUs/threads, number of nodes, MPI processes per node, etc.. This run was done on: Fri Apr 10 09:34:58 IST 2020 RANK=0, NODE=0 RANK=1, NODE=1 MPI startup(): shm fabric is unknown or has been removed from the product, please use ofi or shm:ofi instead Abort(1094543) on node 1 (rank 1 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack: MPIR_Init_thread(649)......: MPID_Init(861).............: MPIDI_NM_mpi_init_hook(953): OFI fi_open domain failed (ofi_init.h:953:MPIDI_NM_mpi_init_hook:No data available) Abort(1094543) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack: MPIR_Init_thread(649)......: MPID_Init(861).............: MPIDI_NM_mpi_init_hook(953): OFI fi_open domain failed (ofi_init.h:953:MPIDI_NM_mpi_init_hook:No data available) Done: Fri Apr 10 09:34:58 IST 2020
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
This is an issue with the statically linked XHPL benchmark from MKL that links against an Intel MPI library version that is not aware of the new mlx provider.
Therefore as a workaround you might instead use the dynamically linked XHPL or alternatively use a different fabric provider like FI_FABRICS=verbs.
This is not an issue in Intel MPI, but I am doing an internal follow up with the MKL team.
Best regards,
Michael
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
We can confirm that this is a bug in MKL and will track it accordingly.
Best regards,
Michael
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The product fix is part of MKL 2020 update 2.
This issue has been resolved and we will no longer respond to this thread. If you require additional assistance from Intel, please start a new thread. Any further interaction in this thread will be considered community only
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page