- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am working on a Windows 11 machine and I have the lastest version (2025.2.1) of the Intel oneAPI Base Toolkit installed. I am encountering a runtime error when I try to run the MKL cluster solver examples that has been configured to use MS-MPI instead of Intel MPI.
In particular, I have copied the entire contents of C:\Program Files (x86)\Intel\oneAPI\mkl\latest\share\doc\mkl\examples into a temporary directory C:\Users\pzajac\Desktop\examples and I have extracted all ZIP archives and created a new "build" directory in the c_mpi subdirectory. Then I have executed the following commands to set up the build environment, configured with CMake to use the MS-MPI library and compiled via Ninja:
Microsoft Windows [Version 10.0.22631.5909]
(c) Microsoft Corporation. Alle Rechte vorbehalten.
C:\Users\pzajac\Desktop\examples\c_mpi\build>"C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
:: initializing oneAPI environment...
Initializing Visual Studio command-line environment...
Visual Studio version 17.14.16 environment configured.
"C:\Program Files\Microsoft Visual Studio\2022\Community\"
Visual Studio command-line environment initialized for: 'x64'
: advisor -- latest
: compiler -- latest
: dal -- latest
: debugger -- latest
: dev-utilities -- latest
: dnnl -- latest
: dpcpp-ct -- latest
: dpl -- latest
: ipp -- latest
: ippcp -- latest
: mkl -- latest
: ocloc -- latest
: pti -- latest
: tbb -- latest
: umf -- latest
: vtune -- latest
:: oneAPI environment initialized ::
C:\Users\pzajac\Desktop\examples\c_mpi\build>cmake .. -G "Ninja" -DMKL_MPI="msmpi" -DMKL_INTERFACE="ilp64"
-- MKL_ROOT: C:/Program Files (x86)/Intel/oneAPI/mkl/latest
-- PROJECT_SOURCE_DIR: C:/Users/pzajac/Desktop/examples/c_mpi
-- CMAKE_C_COMPILER: None, set to `icx` by default
-- TARGET_DOMAINS: cdft;cluster_sparse_solver;fftw3x_cdft;pblas;scalapack
-- TARGET_FUNCTIONS: Undefined
-- CMAKE_BUILD_TYPE: None, set to ` Release` by default
-- CMAKE_GENERATOR: Ninja
-- The C compiler identification is IntelLLVM 2025.2.0 with MSVC-like command-line
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: C:/Program Files (x86)/Intel/oneAPI/compiler/latest/bin/icx.exe - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- MKL_VERSION: 2025.2.0
-- MKL_ROOT: C:/Program Files (x86)/Intel/oneAPI/mkl/latest
-- MKL_ARCH: intel64
-- MKL_LINK: None, set to ` dynamic` by default
-- MKL_INTERFACE_FULL: intel_ilp64
-- MKL_THREADING: None, set to ` intel_thread` by default
-- MKL_MPI: msmpi
-- Found C:/Program Files (x86)/Intel/oneAPI/mkl/latest/lib/mkl_scalapack_ilp64_dll.lib
-- Found DLL: C:/Program Files (x86)/Intel/oneAPI/mkl/latest/bin/mkl_scalapack_ilp64.2.dll
-- Found C:/Program Files (x86)/Intel/oneAPI/mkl/latest/lib/mkl_cdft_core_dll.lib
-- Found DLL: C:/Program Files (x86)/Intel/oneAPI/mkl/latest/bin/mkl_cdft_core.2.dll
-- Found C:/Program Files (x86)/Intel/oneAPI/mkl/latest/lib/mkl_intel_ilp64_dll.lib
-- Found C:/Program Files (x86)/Intel/oneAPI/mkl/latest/lib/mkl_intel_thread_dll.lib
-- Found DLL: C:/Program Files (x86)/Intel/oneAPI/mkl/latest/bin/mkl_intel_thread.2.dll
-- Found C:/Program Files (x86)/Intel/oneAPI/mkl/latest/lib/mkl_core_dll.lib
-- Found DLL: C:/Program Files (x86)/Intel/oneAPI/mkl/latest/bin/mkl_core.2.dll
-- Found C:/Program Files (x86)/Intel/oneAPI/mkl/latest/lib/mkl_blacs_ilp64_dll.lib
-- Found DLL: C:/Program Files (x86)/Intel/oneAPI/mkl/latest/bin/mkl_blacs_ilp64.2.dll
-- Found C:/Program Files (x86)/Intel/oneAPI/compiler/latest/lib/libiomp5md.lib
-- Found MPI_C: C:/Program Files (x86)/Microsoft SDKs/MPI/Lib/x64/msmpi.lib (found version "2.0")
-- Found MPI: TRUE (found version "2.0")
-- Functions list cdft: dm_complex_2d_double_ex1;dm_complex_2d_double_ex2;dm_complex_2d_single_ex1;dm_complex_2d_single_ex2
-- Functions list cluster_sparse_solver: cl_solver_unsym_c;cl_solver_unsym_distr_c;cl_solver_unsym_complex_c;cl_solver_sym_sp_0_based_c;cl_solver_export_c
-- Functions list fftw3x_cdft: wrappers_c1d;wrappers_c2d;wrappers_c3d;wrappers_c4d
-- Functions list pblas: pblas1_s_example;pblas2_s_example;pblas3_s_example;pblas1_d_example;pblas2_d_example;pblas3_d_example
-- Functions list scalapack: pcgetrf_example;pdgetrf_example;psgetrf_example;pzgetrf_example
-- Configuring done (2.5s)
-- Generating done (0.1s)
-- Build files have been written to: C:/Users/pzajac/Desktop/examples/c_mpi/build
C:\Users\pzajac\Desktop\examples\c_mpi\build>ninja
[62/62] Linking C executable scalapack-psgetrf_example.exe
So far so good. When I try to run one of the examples, I get the following error:
C:\Users\pzajac\Desktop\examples\c_mpi\build>mpiexec -n 2 cluster_sparse_solver-cl_solver_unsym_c.exe
Intel oneMKL FATAL ERROR: Cannot load mkl_blacs_intelmpi_lp64.2.dll.
Intel oneMKL FATAL ERROR: Cannot load mkl_blacs_intelmpi_lp64.2.dll.
job aborted:
[ranks] message
[0-1] process exited without calling finalize
---- error analysis -----
[0-1] on ZAJAC-PC
cluster_sparse_solver-cl_solver_unsym_c.exe ended prematurely and may have crashed. exit code 2
---- error analysis -----
I can verify via 'where' that the requested DLL can be found and that 'mpiexec' is in fact the binary from MS-MPI:
C:\Users\pzajac\Desktop\examples\c_mpi\build>where mkl_blacs_intelmpi_lp64.2.dll
C:\Program Files (x86)\Intel\oneAPI\mkl\latest\bin\mkl_blacs_intelmpi_lp64.2.dll
C:\Program Files (x86)\Intel\oneAPI\2025.2\bin\mkl_blacs_intelmpi_lp64.2.dll
C:\Users\pzajac\Desktop\examples\c_mpi\build>where mpiexec
C:\Program Files\Microsoft MPI\Bin\mpiexec.exe
Apparently, for some reason the compiled binary tries to load the Intel MPI DLL "mkl_blacs_intelmpi_lp64.2.dll" instead of its corresponding MS-MPI counterpart "mkl_blacs_msmpi_lp64.2.dll" and I can verify that this is the core of the issue, because if I create a copy of the MS-MPI DLL in the build directory and name it as the Intel MPI DLL, so that this copy is used by the compiled binary, then I can run the example without any problems:
C:\Users\pzajac\Desktop\examples\c_mpi\build>copy "C:\Program Files (x86)\Intel\oneAPI\mkl\latest\bin\mkl_blacs_msmpi_lp64.2.dll" ".\mkl_blacs_intelmpi_lp64.2.dll"
1 Datei(en) kopiert.
C:\Users\pzajac\Desktop\examples\c_mpi\build>mpiexec -n 2 cluster_sparse_solver-cl_solver_unsym_c.exe
=== CPARDISO: solving a real nonsymmetric system ===
1-based array indexing is turned ON
CPARDISO double precision computation is turned ON
METIS algorithm at reorder step is turned ON
Scaling is turned ON
Matching is turned ON
Summary: ( reordering phase )
================
Times:
======
Time spent in calculations of symmetric matrix portrait (fulladj): 0.000002 s
Time spent in reordering of the initial matrix (reorder) : 0.000102 s
Time spent in symbolic factorization (symbfct) : 0.001282 s
Time spent in data preparations for factorization (parlist) : 0.000000 s
Time spent in allocation of internal data structures (malloc) : 0.000084 s
Time spent in matching/scaling : 0.000035 s
Time spent in additional calculations : 0.000072 s
Total time spent : 0.001578 s
Statistics:
===========
Parallel Direct Factorization is running on 2 MPI and 24 OpenMP per MPI process
< Linear system Ax = b >
number of equations: 5
number of non-zeros in A: 13
number of non-zeros in A (%): 52.000000
number of right-hand sides: 1
< Factors L and U >
number of columns for each panel: 128
number of independent subgraphs: 0
< Preprocessing with state of the art partitioning metis>
number of supernodes: 2
size of largest supernode: 4
number of non-zeros in L: 19
number of non-zeros in U: 2
number of non-zeros in L+U: 21
Percentage of computed non-zeros for LL^T factorization
94 %
100 %
=== CPARDISO: solving a real nonsymmetric system ===
Single-level factorization algorithm is turned ON
Summary: ( factorization phase )
================
Times:
======
Time spent in copying matrix to internal data structure (A to LU): 0.000004 s
Time spent in factorization step (numfct) : 0.006846 s
Time spent in allocation of internal data structures (malloc) : 0.000026 s
Time spent in additional calculations : 0.000002 s
Total time spent : 0.006878 s
Statistics:
===========
Parallel Direct Factorization is running on 2 MPI and 24 OpenMP per MPI process
< Linear system Ax = b >
number of equations: 5
number of non-zeros in A: 13
number of non-zeros in A (%): 52.000000
number of right-hand sides: 1
< Factors L and U >
number of columns for each panel: 128
number of independent subgraphs: 0
< Preprocessing with state of the art partitioning metis>
number of supernodes: 2
size of largest supernode: 4
number of non-zeros in L: 19
number of non-zeros in U: 2
number of non-zeros in L+U: 21
gflop for the numerical factorization: 0.000000
gflop/s for the numerical factorization: 0.000007
=== CPARDISO: solving a real nonsymmetric system ===
Summary: ( solution phase )
================
Times:
======
Time spent in direct solver at solve step (solve) : 0.000420 s
Time spent in additional calculations : 0.001628 s
Total time spent : 0.002048 s
Statistics:
===========
Parallel Direct Factorization is running on 2 MPI and 24 OpenMP per MPI process
< Linear system Ax = b >
number of equations: 5
number of non-zeros in A: 13
number of non-zeros in A (%): 52.000000
number of right-hand sides: 1
< Factors L and U >
number of columns for each panel: 128
number of independent subgraphs: 0
< Preprocessing with state of the art partitioning metis>
number of supernodes: 2
size of largest supernode: 4
number of non-zeros in L: 19
number of non-zeros in U: 2
number of non-zeros in L+U: 21
gflop for the numerical factorization: 0.000000
gflop/s for the numerical factorization: 0.000007
Number of iterative refinement steps performed 2
Tolerance level in the iterative refinement process 0.000000e+00
Backward error after the iterative refinement process 0.000000e+00
Reordering completed ...
Factorization completed ...
Solving system...
The solution of the system is:
x [0] = -0.522321
x [1] = -0.008929
x [2] = 1.220982
x [3] = -0.504464
x [4] = -0.214286
Relative residual = 1.489520e-16
TEST PASSED
So it seems that one of the MKL components used by the cluster solver example is always trying to use the BLACS DLLS compiled for use with Intel MPI instead of MS-MPI.
Link Copied

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page