I am working on a Windows 11 machine and I have the lastest version (2025.2.1) of the Intel oneAPI Base Toolkit installed. I am encountering a runtime error when I try to run the MKL cluster solver examples that has been configured to use MS-MPI instead of Intel MPI.
In particular, I have copied the entire contents of C:\Program Files (x86)\Intel\oneAPI\mkl\latest\share\doc\mkl\examples into a temporary directory C:\Users\pzajac\Desktop\examples and I have extracted all ZIP archives and created a new "build" directory in the c_mpi subdirectory. Then I have executed the following commands to set up the build environment, configured with CMake to use the MS-MPI library and compiled via Ninja:
Microsoft Windows [Version 10.0.22631.5909]
(c) Microsoft Corporation. Alle Rechte vorbehalten.
C:\Users\pzajac\Desktop\examples\c_mpi\build>"C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
:: initializing oneAPI environment...
Initializing Visual Studio command-line environment...
Visual Studio version 17.14.16 environment configured.
"C:\Program Files\Microsoft Visual Studio\2022\Community\"
Visual Studio command-line environment initialized for: 'x64'
: advisor -- latest
: compiler -- latest
: dal -- latest
: debugger -- latest
: dev-utilities -- latest
: dnnl -- latest
: dpcpp-ct -- latest
: dpl -- latest
: ipp -- latest
: ippcp -- latest
: mkl -- latest
: ocloc -- latest
: pti -- latest
: tbb -- latest
: umf -- latest
: vtune -- latest
:: oneAPI environment initialized ::
C:\Users\pzajac\Desktop\examples\c_mpi\build>cmake .. -G "Ninja" -DMKL_MPI="msmpi" -DMKL_INTERFACE="ilp64"
-- MKL_ROOT: C:/Program Files (x86)/Intel/oneAPI/mkl/latest
-- PROJECT_SOURCE_DIR: C:/Users/pzajac/Desktop/examples/c_mpi
-- CMAKE_C_COMPILER: None, set to `icx` by default
-- TARGET_DOMAINS: cdft;cluster_sparse_solver;fftw3x_cdft;pblas;scalapack
-- TARGET_FUNCTIONS: Undefined
-- CMAKE_BUILD_TYPE: None, set to ` Release` by default
-- CMAKE_GENERATOR: Ninja
-- The C compiler identification is IntelLLVM 2025.2.0 with MSVC-like command-line
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: C:/Program Files (x86)/Intel/oneAPI/compiler/latest/bin/icx.exe - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- MKL_VERSION: 2025.2.0
-- MKL_ROOT: C:/Program Files (x86)/Intel/oneAPI/mkl/latest
-- MKL_ARCH: intel64
-- MKL_LINK: None, set to ` dynamic` by default
-- MKL_INTERFACE_FULL: intel_ilp64
-- MKL_THREADING: None, set to ` intel_thread` by default
-- MKL_MPI: msmpi
-- Found C:/Program Files (x86)/Intel/oneAPI/mkl/latest/lib/mkl_scalapack_ilp64_dll.lib
-- Found DLL: C:/Program Files (x86)/Intel/oneAPI/mkl/latest/bin/mkl_scalapack_ilp64.2.dll
-- Found C:/Program Files (x86)/Intel/oneAPI/mkl/latest/lib/mkl_cdft_core_dll.lib
-- Found DLL: C:/Program Files (x86)/Intel/oneAPI/mkl/latest/bin/mkl_cdft_core.2.dll
-- Found C:/Program Files (x86)/Intel/oneAPI/mkl/latest/lib/mkl_intel_ilp64_dll.lib
-- Found C:/Program Files (x86)/Intel/oneAPI/mkl/latest/lib/mkl_intel_thread_dll.lib
-- Found DLL: C:/Program Files (x86)/Intel/oneAPI/mkl/latest/bin/mkl_intel_thread.2.dll
-- Found C:/Program Files (x86)/Intel/oneAPI/mkl/latest/lib/mkl_core_dll.lib
-- Found DLL: C:/Program Files (x86)/Intel/oneAPI/mkl/latest/bin/mkl_core.2.dll
-- Found C:/Program Files (x86)/Intel/oneAPI/mkl/latest/lib/mkl_blacs_ilp64_dll.lib
-- Found DLL: C:/Program Files (x86)/Intel/oneAPI/mkl/latest/bin/mkl_blacs_ilp64.2.dll
-- Found C:/Program Files (x86)/Intel/oneAPI/compiler/latest/lib/libiomp5md.lib
-- Found MPI_C: C:/Program Files (x86)/Microsoft SDKs/MPI/Lib/x64/msmpi.lib (found version "2.0")
-- Found MPI: TRUE (found version "2.0")
-- Functions list cdft: dm_complex_2d_double_ex1;dm_complex_2d_double_ex2;dm_complex_2d_single_ex1;dm_complex_2d_single_ex2
-- Functions list cluster_sparse_solver: cl_solver_unsym_c;cl_solver_unsym_distr_c;cl_solver_unsym_complex_c;cl_solver_sym_sp_0_based_c;cl_solver_export_c
-- Functions list fftw3x_cdft: wrappers_c1d;wrappers_c2d;wrappers_c3d;wrappers_c4d
-- Functions list pblas: pblas1_s_example;pblas2_s_example;pblas3_s_example;pblas1_d_example;pblas2_d_example;pblas3_d_example
-- Functions list scalapack: pcgetrf_example;pdgetrf_example;psgetrf_example;pzgetrf_example
-- Configuring done (2.5s)
-- Generating done (0.1s)
-- Build files have been written to: C:/Users/pzajac/Desktop/examples/c_mpi/build
C:\Users\pzajac\Desktop\examples\c_mpi\build>ninja
[62/62] Linking C executable scalapack-psgetrf_example.exe
So far so good. When I try to run one of the examples, I get the following error:
C:\Users\pzajac\Desktop\examples\c_mpi\build>mpiexec -n 2 cluster_sparse_solver-cl_solver_unsym_c.exe
Intel oneMKL FATAL ERROR: Cannot load mkl_blacs_intelmpi_lp64.2.dll.
Intel oneMKL FATAL ERROR: Cannot load mkl_blacs_intelmpi_lp64.2.dll.
job aborted:
[ranks] message
[0-1] process exited without calling finalize
---- error analysis -----
[0-1] on ZAJAC-PC
cluster_sparse_solver-cl_solver_unsym_c.exe ended prematurely and may have crashed. exit code 2
---- error analysis -----I can verify via 'where' that the requested DLL can be found and that 'mpiexec' is in fact the binary from MS-MPI:
C:\Users\pzajac\Desktop\examples\c_mpi\build>where mkl_blacs_intelmpi_lp64.2.dll
C:\Program Files (x86)\Intel\oneAPI\mkl\latest\bin\mkl_blacs_intelmpi_lp64.2.dll
C:\Program Files (x86)\Intel\oneAPI\2025.2\bin\mkl_blacs_intelmpi_lp64.2.dll
C:\Users\pzajac\Desktop\examples\c_mpi\build>where mpiexec
C:\Program Files\Microsoft MPI\Bin\mpiexec.exeApparently, for some reason the compiled binary tries to load the Intel MPI DLL "mkl_blacs_intelmpi_lp64.2.dll" instead of its corresponding MS-MPI counterpart "mkl_blacs_msmpi_lp64.2.dll" and I can verify that this is the core of the issue, because if I create a copy of the MS-MPI DLL in the build directory and name it as the Intel MPI DLL, so that this copy is used by the compiled binary, then I can run the example without any problems:
C:\Users\pzajac\Desktop\examples\c_mpi\build>copy "C:\Program Files (x86)\Intel\oneAPI\mkl\latest\bin\mkl_blacs_msmpi_lp64.2.dll" ".\mkl_blacs_intelmpi_lp64.2.dll"
1 Datei(en) kopiert.
C:\Users\pzajac\Desktop\examples\c_mpi\build>mpiexec -n 2 cluster_sparse_solver-cl_solver_unsym_c.exe
=== CPARDISO: solving a real nonsymmetric system ===
1-based array indexing is turned ON
CPARDISO double precision computation is turned ON
METIS algorithm at reorder step is turned ON
Scaling is turned ON
Matching is turned ON
Summary: ( reordering phase )
================
Times:
======
Time spent in calculations of symmetric matrix portrait (fulladj): 0.000002 s
Time spent in reordering of the initial matrix (reorder) : 0.000102 s
Time spent in symbolic factorization (symbfct) : 0.001282 s
Time spent in data preparations for factorization (parlist) : 0.000000 s
Time spent in allocation of internal data structures (malloc) : 0.000084 s
Time spent in matching/scaling : 0.000035 s
Time spent in additional calculations : 0.000072 s
Total time spent : 0.001578 s
Statistics:
===========
Parallel Direct Factorization is running on 2 MPI and 24 OpenMP per MPI process
< Linear system Ax = b >
number of equations: 5
number of non-zeros in A: 13
number of non-zeros in A (%): 52.000000
number of right-hand sides: 1
< Factors L and U >
number of columns for each panel: 128
number of independent subgraphs: 0
< Preprocessing with state of the art partitioning metis>
number of supernodes: 2
size of largest supernode: 4
number of non-zeros in L: 19
number of non-zeros in U: 2
number of non-zeros in L+U: 21
Percentage of computed non-zeros for LL^T factorization
94 %
100 %
=== CPARDISO: solving a real nonsymmetric system ===
Single-level factorization algorithm is turned ON
Summary: ( factorization phase )
================
Times:
======
Time spent in copying matrix to internal data structure (A to LU): 0.000004 s
Time spent in factorization step (numfct) : 0.006846 s
Time spent in allocation of internal data structures (malloc) : 0.000026 s
Time spent in additional calculations : 0.000002 s
Total time spent : 0.006878 s
Statistics:
===========
Parallel Direct Factorization is running on 2 MPI and 24 OpenMP per MPI process
< Linear system Ax = b >
number of equations: 5
number of non-zeros in A: 13
number of non-zeros in A (%): 52.000000
number of right-hand sides: 1
< Factors L and U >
number of columns for each panel: 128
number of independent subgraphs: 0
< Preprocessing with state of the art partitioning metis>
number of supernodes: 2
size of largest supernode: 4
number of non-zeros in L: 19
number of non-zeros in U: 2
number of non-zeros in L+U: 21
gflop for the numerical factorization: 0.000000
gflop/s for the numerical factorization: 0.000007
=== CPARDISO: solving a real nonsymmetric system ===
Summary: ( solution phase )
================
Times:
======
Time spent in direct solver at solve step (solve) : 0.000420 s
Time spent in additional calculations : 0.001628 s
Total time spent : 0.002048 s
Statistics:
===========
Parallel Direct Factorization is running on 2 MPI and 24 OpenMP per MPI process
< Linear system Ax = b >
number of equations: 5
number of non-zeros in A: 13
number of non-zeros in A (%): 52.000000
number of right-hand sides: 1
< Factors L and U >
number of columns for each panel: 128
number of independent subgraphs: 0
< Preprocessing with state of the art partitioning metis>
number of supernodes: 2
size of largest supernode: 4
number of non-zeros in L: 19
number of non-zeros in U: 2
number of non-zeros in L+U: 21
gflop for the numerical factorization: 0.000000
gflop/s for the numerical factorization: 0.000007
Number of iterative refinement steps performed 2
Tolerance level in the iterative refinement process 0.000000e+00
Backward error after the iterative refinement process 0.000000e+00
Reordering completed ...
Factorization completed ...
Solving system...
The solution of the system is:
x [0] = -0.522321
x [1] = -0.008929
x [2] = 1.220982
x [3] = -0.504464
x [4] = -0.214286
Relative residual = 1.489520e-16
TEST PASSED
So it seems that one of the MKL components used by the cluster solver example is always trying to use the BLACS DLLS compiled for use with Intel MPI instead of MS-MPI.
Hi,
Did you have a chance to look at my explanation? Do you have any further questions on the topic?
Regards,
Alex
链接已复制
Hi,
Please try setting MKL_BLACS_MPI variable, eg.
mpiexec -n 2 -genv MKL_BLACS_MPI MSMPI cluster_sparse_solver-cl_solver_unsym_c.exe
or set MKL_BLACS_MPI=MSMPI before mpiexec
Regards,
Alex
Hello,
thank you for your reply and yes, I can confirm that setting the environment variable MKL_BLACS_MPI=MSMPI does resolve the issue and the example runs fine now.
Is setting this environment variable merely a workaround for this issue or is this part of the necessary steps that I should have taken in the first place? I do not recall reading that I have to adjust any environment variables in the online documentation of the Intel MKL Parallel Direct Sparse Solver for Clusters pages.
Best regards,
- Peter
Please refer to the Readme file that goes with examples. Cmake sets some local environment variables (interface type, MPI type and type of linking) with the first command "cmake .. -G "Ninja" ..." . The default values for MKL_BLACS_MPI is INTELMPI, interface type is lp64 , see
Setting Environment Variables on a Cluster
If we look at the further steps for running examples, it also requires using cmake again for running examples (e.g. Example: $> cmake --build . -j 24 --verbose and Example: $> ctest --verbose). And only cmake can see values of these env variables.
When you try to run examples without cmake (using mpiexec directly), you should set all these environment variables directly in the command prompt if you want to use dynamic linking. (It's not necessary to set these env variables in the case of static linking). For more details, see Using DLLs.
Regards,
Alex
Hi,
Did you have a chance to look at my explanation? Do you have any further questions on the topic?
Regards,
Alex
Hello,
well, setting the environment variable has resolved our main issue, so I can accept that as a solution.
Thank you,
- Peter