- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I recently upgraded to the latest version of oneAPI (2025.0.1) to make sure the C code I'm working on is compatible and have found the ScaLAPACK eigensolvers (pzheevr and pzheevx) I use to no longer work correctly. If I set the matrix order beyond a specific value (~4,000), both the eigensolvers mentioned will hang. If I increase the matrix order further (~10,000), I get an error suggesting an internal out of range tagging issue. An example of the error is as follows:
Abort(253875716) on node 1 (rank 1 in comm 0): Fatal error in internal_Irecv: Unknown error class, error stack:
internal_Irecv(37571): MPI_Irecv(buf=0x75408b076010, count=32, dtype=USER<resized>, 0, 557861, comm=0xc4000001, request=0x7ffd69132a9c) failed
internal_Irecv(37527): Invalid tag, value is 557861
The above error occurred when using 4 MPI processes and a block size of 32, but the hang/error appears to occur no matter the block size or number of processes used (tested from 2 to 8). I ran the code on a Ubuntu 22.04 system with an Intel i7-10700K processor.
I did some looking into how the BLACS tag range is set and, correct me if I'm wrong, the range should be between 0 and the value returned by MPI_Comm_get_attr() using MPI_TAG_UB which is defined by the MPI implementation. On my system, this value is 524287, so the above error suggests BLACS is attempting to use tag integer values that are outside the permitted range.
I did some further looking into it and found that BLACS provides functions to both get and set the tag /ID range (blacs_get and blacs_set). When making the call to blacs_get to view the tag range, I have found the function to return values that appear to make little sense. I have given an example of the output from two successive runs below where I have outputted the BLACS minimum tag/ID, the BLACS maximum tag/ID, and the value defined by MPI_TAG_UB for four processes:
Process 0, ID_range min = 1140507904
Process 0, ID_range max = 0
Process 0, tag ub = 524287
Process 1, ID_range min = 478125312
Process 1, ID_range max = 0
Process 1, tag ub = 524287
Process 3, ID_range min = 1901178432
Process 3, ID_range max = 0
Process 3, tag ub = 524287
Process 2, ID_range min = 1872774784
Process 2, ID_range max = 0
Process 2, tag ub = 524287
Process 0, ID_range min = 1786803152
Process 0, ID_range max = 0
Process 0, tag ub = 524287
Process 1, ID_range min = -1244713744
Process 1, ID_range max = 0
Process 1, tag ub = 524287
Process 3, ID_range min = 480730752
Process 3, ID_range max = 0
Process 3, tag ub = 524287
Process 2, ID_range min = 1604484512
Process 2, ID_range max = 0
Process 2, tag ub = 524287
Outputs like these would occur whether I used the blacs_set to set the tag ID range or left it use the default values.
I did these tests using the 64-bit integer interface as I require the analysis of large matrices, but I decided to change to the 32-bit integer interface during my testing and actually found none of these issues to occur. Both the eigensolvers worked normally with no hangs, at least in the matrix order range I tested (up to 10,000), and the output tag/ID values from blacs_get made sense with the minimum tag/ID equal to 0 and the maximum tag/ID equal to the value defined by MPI_TAG_UB. An example output from a run using 4 processes is as follows:
Process 0, ID_range min = 0
Process 0, ID_range max = 524287
Process 0, tag ub = 524287
Process 1, ID_range min = 0
Process 1, ID_range max = 524287
Process 1, tag ub = 524287
Process 2, ID_range min = 0
Process 2, ID_range max = 524287
Process 2, tag ub = 524287
Process 3, ID_range min = 0
Process 3, ID_range max = 524287
Process 3, tag ub = 524287
I did, however, find that I was still unable to set the tag/ID range with blacs_set despite calling it before the call to blacs_gridinit as instructed in the documentation.
From this, it does look like there is an issue with the 64-bit integer interface and BLACS tags/IDs that I'm assuming is then causing the eigensolvers to hang/error. I have attached the code I used to generate the above outputs which is very basic and simply initialises BLACS and then gets the minimum and maximum BLACS tags and the value defined by MPI_TAG_UB. You can also attempt to set the minimum and maximum BLACS tags by uncommenting line 49. The commands I used to compile the 64-bit integer and 32-bit integer versions respectively are as follows:
mpiicx -DMKL_ILP64 -I"${MKLROOT}/include" blacs_test.c -L${MKLROOT}/lib/intel64 -lmkl_scalapack_ilp64 -lmkl_intel_ilp64 -lmkl_intel_thread -lmkl_core -lmkl_blacs_intelmpi_ilp64 -liomp5 -lpthread -lm -ldl -o blacs_test
mpiicx -I"${MKLROOT}/include" blacs_test.c -L${MKLROOT}/lib/intel64 -lmkl_scalapack_lp64 -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -lmkl_blacs_intelmpi_lp64 -liomp5 -lpthread -lm -ldl -o blacs_test
Please let me know if you also want any other information or another basic code to reproduce the ScaLAPACK eigensolver errors in case you think the issues with blacs_get and blacs_set are unrelated to it. I say this because I also use my code on systems that have MKL 2019 and 2021 versions installed and, although I have no issues with the ScaLAPACK eigensolvers using those versions, when I checked I did find the same issues to occur with blacs_get and blacs_set when using the 64-bit integer interface. This does suggests that I'm either doing something incorrectly / misunderstanding something here or that those functions may be bugged going back to at least MKL version 2019.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thank you for posting in the oneMKL forum.
I tried it on my side, and observed the following behavior,
Process 0, ID_range min = 40896
Process 0, ID_range max = 0
Process 0, tag ub = 524287
Process 1, ID_range min = 40896
Process 1, ID_range max = 0
Process 1, tag ub = 524287
Process 2, ID_range min = 40896
Process 2, ID_range max = 0
Process 2, tag ub = 524287
Process 3, ID_range min = 40896
Process 3, ID_range max = 0
Process 3, tag ub = 524287
when linking to the ilp64 libraries. The difference is that the ID range of all the 4 processes are the same.
I tested the in-package example, pzgetrf_example, with ilp64 libraries and it passed. So I guess the 64-bit integer interfaces might not be the cause of the problem. For the original issue, neither the hang nor the out-of-range tag number should be expected. That smells like a bug. Could you please provide a reproducer so that I can look deeper into it?
Thanks,
Fengrui
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Fengrui,
Thanks for the reply. From testing things again, the variable behaviour between processes may actually only occur when I attempt to set the BLACS ID range using blacs_set, which you can do by uncommenting line 49 in blacs_test.c. Without using blacs_set, I get outputs similar to yours, where the ID range does not appear to be set using the value defined by MPI_TAG_UB unlike for the 32-bit integer version, but is still consistent across processes. For example, here are the two outputs from when I just now compiled and ran the code without and then with setting the BLACS ID range. 4 MPI processes were used and the code was compiled using the compilation command from my original post for the 64-bit integer interface.
Without setting the ID range:
Process 0, ID_range min = 40896
Process 0, ID_range max = 1023
Process 0, tag ub = 524287
Process 1, ID_range min = 40896
Process 1, ID_range max = 1023
Process 1, tag ub = 524287
Process 3, ID_range min = 40896
Process 3, ID_range max = 1023
Process 3, tag ub = 524287
Process 2, ID_range min = 40896
Process 2, ID_range max = 1023
Process 2, tag ub = 524287
With setting the ID range:
Process 0, ID_range min = 0
Process 0, ID_range max = -2076694048
Process 0, tag ub = 524287
Process 1, ID_range min = 0
Process 1, ID_range max = 616049120
Process 1, tag ub = 524287
Process 2, ID_range min = 0
Process 2, ID_range max = -933746208
Process 2, tag ub = 524287
Process 3, ID_range min = 0
Process 3, ID_range max = 1652042208
Process 3, tag ub = 524287
Meanwhile, when using the 32-bit integer interface, although I appear to still be unable to change the ID range, the returned ranges from blacs_get are still consistent with the minimum equal to 0 and the maximum equal to the value defined by MPI_TAG_UB. Can you test this for both the 64-bit and 32-bit interfaces and check if you are (1) able to set the ID range and (2) if attempting to set the ID range in the 64-bit integer case results in the same inconsistent output behaviour between the processes?
I have also put together an example of code reproducing the ScaLAPACK error mentioned in my original post. You can ignore the six functions at the end as they just take in, operate on, and return singular int or double values and are just used for setting the matrix element values. The BLACS grid is set up in the main function and the eigensolver is called in the Test_QW_PWEM function. To compile the code, the following commands are used:
32-bit integer interface:
mpiicx -I"${MKLROOT}/include" scalapack_test.c -L${MKLROOT}/lib/intel64 -lmkl_scalapack_lp64 -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -lmkl_blacs_intelmpi_lp64 -liomp5 -lpthread -lm -ldl -o scalapack_test
64-bit integer interface:
mpiicx -DMKL_ILP64 -I"${MKLROOT}/include" scalapack_test.c -L${MKLROOT}/lib/intel64 -lmkl_scalapack_ilp64 -lmkl_intel_ilp64 -lmkl_intel_thread -lmkl_core -lmkl_blacs_intelmpi_ilp64 -liomp5 -lpthread -lm -ldl -o scalapack_test
The order of the matrix is set with the 1st input parameter. For a matrix of order 10001, (or with 10001^2 elements), the code is called as follows using 4 MPI processes:
mpirun -np 4 ./scalapack_test 10001
For the 32-bit integer interface, the resulting output is as follows:
Process 0, ID_range min = 0
Process 0, ID_range max = 524287
Process 0, tag ub = 524287
Process 1, ID_range min = 0
Process 1, ID_range max = 524287
Process 1, tag ub = 524287
Process 2, ID_range min = 0
Process 2, ID_range max = 524287
Process 2, tag ub = 524287
Process 3, ID_range min = 0
Process 3, ID_range max = 524287
Process 3, tag ub = 524287
Eigensolver info = 0
Energy 1 = 0.04858125939 eV.
Energy 2 = 0.18914974521 eV.
Energy 3 = 0.23925215535 eV.
Energy 4 = 0.36590995583 eV.
Energy 5 = 0.38158377045 eV.
Energy 6 = 0.54925220310 eV.
Energy 7 = 0.58273722453 eV.
Energy 8 = 0.83617012160 eV.
Energy 9 = 0.87399965983 eV.
Energy 10 = 1.21823718330 eV.
For the 64-bit integer interface, the output is as follows:
Process 0, ID_range min = 40896
Process 0, ID_range max = 1023
Process 0, tag ub = 524287
Process 1, ID_range min = 40896
Process 1, ID_range max = 1023
Process 1, tag ub = 524287
Process 2, ID_range min = 40896
Process 2, ID_range max = 1023
Process 2, tag ub = 524287
Process 3, ID_range min = 40896
Process 3, ID_range max = 1023
Process 3, tag ub = 524287
Abort(857855492) on node 1 (rank 1 in comm 0): Fatal error in internal_Irecv: Unknown error class, error stack:
internal_Irecv(37571): MPI_Irecv(buf=0x7e871be55010, count=32, dtype=USER<resized>, 0, 557861, comm=0xc4000001, request=0x7fff0da460ac) failed
internal_Irecv(37527): Invalid tag, value is 557861
The error is the same no matter if the pzheevr solver is used or the pzheevx solver (by uncommenting lines 216 and 231 and commenting out lines 218 and 233). Based on comparing run times and memory usage to the 32-bit integer version, the error appears to occur right at the end of the eigensolver's run. Also, blacs_set was not called for these runs.
The invalid tag/ID value also seems to increase if the block size (nblk) is increased. The following errors are outputted when using block sizes of 64, 128, and 256 respectively:
Abort(522311172) on node 1 (rank 1 in comm 0): Fatal error in internal_Irecv: Unknown error class, error stack:
internal_Irecv(37571): MPI_Irecv(buf=0x7b125b13c010, count=64, dtype=USER<resized>, 0, 1001591, comm=0xc4000001, request=0x7ffcd54d4b4c) failed
internal_Irecv(37527): Invalid tag, value is 1001591
Abort(253875716) on node 1 (rank 1 in comm 0): Fatal error in internal_Irecv: Unknown error class, error stack:
internal_Irecv(37571): MPI_Irecv(buf=0x7db71c0cf010, count=128, dtype=USER<resized>, 0, 1889439, comm=0xc4000001, request=0x7ffd929f146c) failed
internal_Irecv(37527): Invalid tag, value is 1889439
Abort(522311172) on node 1 (rank 1 in comm 0): Fatal error in internal_Irecv: Unknown error class, error stack:
internal_Irecv(37571): MPI_Irecv(buf=0x7f43d5e5a010, count=256, dtype=USER<resized>, 0, 3665331, comm=0xc4000001, request=0x7ffdc2a314ac) failed
internal_Irecv(37527): Invalid tag, value is 3665331
So, to summarize, there appears to be a couple of tag related issues for BLACS and ScaLAPACK for me here:
(1). blacs_set appears to be unable to set the BLACS tag range for both the 32 and 64-bit integer interfaces and can cause the range returned by blacs_get to be process dependent in the case of the 64-bit integer interface.
(2). Hang / invalid or out of range tag error that occurs using the ScaLAPACK eigensolvers pzheevr/pzheevx.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi
First, Netlib Reference ScaLAPACK 2.0.0 and later releases ( as well as all the latest OneMKL releases) doesn’t allow changing the BLACS message ID range with blacs_set. This option was disabled long time ago. If we try to change it with blacs_set from any Netlib ScaLAPACK release available on https://www.netlib.org/scalapack/, we only get a list of warnings like this one:
$ mpirun -n 4 ./a.out
BLACS WARNING 'No need to set message ID range due to MPI communicator.'
from {-1,-1}, pnum=2, Contxt=-1, on line 18 of file 'blacs_set_.c'.
BLACS WARNING 'No need to set message ID range due to MPI communicator.'
from {-1,-1}, pnum=1, Contxt=-1, on line 18 of file 'blacs_set_.c'.
BLACS WARNING 'No need to set message ID range due to MPI communicator.'
from {-1,-1}, pnum=3, Contxt=-1, on line 18 of file 'blacs_set_.c'.
BLACS WARNING 'No need to set message ID range due to MPI communicator.'
from {-1,-1}, pnum=0, Contxt=-1, on line 18 of file 'blacs_set_.c'.
Since that blacs_set shouldn't change the ID tag range and it's not an issue.
Secondly, the usage of the Intel icc compiler or icx with additional diagnostic flags -Wall -Wextra -std=c11 shows the following problems in your scalapack_test.c and these problems must be fixed before to proceed further:
$ mpiicc -I"${MKLROOT}/include" -c scalapack_test.c
icc: remark #10441: The Intel(R) C++ Compiler Classic (ICC) is deprecated and will be removed from product release in the second half of 2023. The Intel(R) oneAPI DPC++/C++ Compiler (ICX) is the recommended compiler moving forward. Please transition to use this compiler. Use '-diag-disable=10441' to disable this message.
scalapack_test.c(19): error: expression must have a constant value
const double hbar = h / (2 * M_PI);
^
compilation aborted for scalapack_test.c (code 2)
$ mpiicx -I"${MKLROOT}/include" -Wall -Wextra -std=c11 -c scalapack_test.c
scalapack_test.c:19:30: error: use of undeclared identifier 'M_PI'
19 | const double hbar = h / (2 * M_PI);
| ^
scalapack_test.c:205:9: warning: unused variable 'abstol' [-Wunused-variable]
205 | double abstol = 2*pdlamch(&my_blacs_ctxt, "S"); // High accuracy setting for error tolerance
| ^~~~~~
scalapack_test.c:207:9: warning: unused variable 'orfac' [-Wunused-variable]
207 | double orfac = 0;
| ^~~~~
scalapack_test.c:319:42: error: use of undeclared identifier 'M_PI'
319 | return L * sin( M_PI * m * L / SL ) / (M_PI * m * L);
| ^
scalapack_test.c:319:19: error: use of undeclared identifier 'M_PI'
319 | return L * sin( M_PI * m * L / SL ) / (M_PI * m * L);
| ^
scalapack_test.c:342:13: error: use of undeclared identifier 'M_PI'
342 | return 2 * M_PI * m / SL;
| ^
2 warnings and 4 errors generated.
What kind of MPI and its version number are you using (mpirun –version)??
We’d like to notice that the following linking line must be used for OneMKL 2024 and later releases:
$ mpiicx -I"${MKLROOT}/include" scalapack_test.c -L${MKLROOT}/lib -lmkl_scalapack_lp64 -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -lmkl_blacs_intelmpi_lp64 -liomp5 -lpthread -lm -ldl
(e.g. you need to replace -L${MKLROOT}/lib/intel64 with -L${MKLROOT}/lib)
All the best
Sergey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Sergey,
Thanks for that. I'll ignore blacs_set and blacs_get so then.
I have updated the code to conform to the icx c11 standard and have attached it. I have tested it with the updated linking line and the issue remains. mpirun --version reports Intel MPI version 2021.14. I used the following compilation commands:
32-bit integer interface
mpiicx -I"${MKLROOT}/include" scalapack_test.c -L${MKLROOT}/lib -lmkl_scalapack_lp64 -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -lmkl_blacs_intelmpi_lp64 -liomp5 -lpthread -lm -ldl -o scalapack_test -Wall -Wextra -std=c11
64-bit integer interface
mpiicx -DMKL_ILP64 -I"${MKLROOT}/include" scalapack_test.c -L${MKLROOT}/lib -lmkl_scalapack_ilp64 -lmkl_intel_ilp64 -lmkl_intel_thread -lmkl_core -lmkl_blacs_intelmpi_ilp64 -liomp5 -lpthread -lm -ldl -o scalapack_test -Wall -Wextra -std=c11
The two unused variable warnings refer to extra variables that are passed to pzheevx compared to pzheevr if it is chosen to use pzheevx.
I also enabled MKL_VERBOSE=1 and am observing the same as Fengrui regarding nb being unexpectedly set to {3001,0} when running with N=3001 using the 64-bit integer interface despite outputting the correct results.
For larger N sizes using the 64-bit integer interface, I'm still observing a hang / invalid tag error. For example, the following is output with N=10001:
Process 0, ID_range min = 40896
Process 0, ID_range max = 1023
Process 0, tag ub = 524287
Process 1, ID_range min = 40896
Process 1, ID_range max = 1023
Process 1, tag ub = 524287
Process 3, ID_range min = 40896
Process 3, ID_range max = 1023
Process 3, tag ub = 524287
Process 2, ID_range min = 40896
Process 2, ID_range max = 1023
Process 2, tag ub = 524287
MKL_VERBOSE oneMKL 2025 Patch 1 Product build 20241031 for Intel(R) 64 architecture Intel(R) Advanced Vector Extensions 2 (Intel(R) AVX2) enabled processors, Lnx 3.80GHz ilp64 intel_thread
MKL_VERBOSE DLAMCH(S) 16.55us CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:1
MKL_VERBOSE oneMKL 2025 Patch 1 Product build 20241031 for Intel(R) 64 architecture Intel(R) Advanced Vector Extensions 2 (Intel(R) AVX2) enabled processors, Lnx 3.80GHz ilp64 intel_thread
MKL_VERBOSE DLAMCH(S) 14.26us CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:1
MKL_VERBOSE oneMKL 2025 Patch 1 Product build 20241031 for Intel(R) 64 architecture Intel(R) Advanced Vector Extensions 2 (Intel(R) AVX2) enabled processors, Lnx 3.80GHz ilp64 intel_thread
MKL_VERBOSE DLAMCH(S) 16.80us CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:1
MKL_VERBOSE oneMKL 2025 Patch 1 Product build 20241031 for Intel(R) 64 architecture Intel(R) Advanced Vector Extensions 2 (Intel(R) AVX2) enabled processors, Lnx 3.80GHz ilp64 intel_thread
MKL_VERBOSE DLAMCH(S) 13.96us CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:1
MKL_VERBOSE DLAMCH(S) 503ns CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:1
MKL_VERBOSE DLAMCH(S) 520ns CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:1
MKL_VERBOSE DLAMCH(S) 310ns CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:1
MKL_VERBOSE DLAMCH(S) 320ns CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:1
MKL_VERBOSE DLAMCH(S) 102ns CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:1
MKL_VERBOSE DLAMCH(S) 112ns CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:1
MKL_VERBOSE DLAMCH(S) 141ns CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:1
MKL_VERBOSE DLAMCH(S) 137ns CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:1
Abort(186766852) on node 1 (rank 1 in comm 0): Fatal error in internal_Irecv: Unknown error class, error stack:
internal_Irecv(37571): MPI_Irecv(buf=0x719e91255010, count=32, dtype=USER<resized>, 0, 557861, comm=0xc4000001, request=0x7ffd7289c39c) failed
internal_Irecv(37527): Invalid tag, value is 557861
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you for providing the reproducer!
I tested on my side. It looks blacs_set() is working unexpectedly for both lp64 and ilp64 interfaces. It either not be able to change the id range or shows inconsistent behavior across processes.
Using ilp64 interfaces with MKL_VERBOSE=1, I got hang for N=4001. For N=3001, I got the same Energy output as lp64 interfaces do, but the block size is unexpectedly set to {3001,0}, rather than {32,32}.
lp64 MKL_VERBOSE=1 output:
...
MKL_VERBOSE PZHEEVR(V,I,L,3001,0xf5cdc200010,1,1,0x441700,0x7ffe4cfe47c8,0x7ffe4cfe47c0,1,10,10,10,0x1438630,0x13efb60,1,1,0x441730,0xf5df7f76010,100058,0x143e400,84030,0x14e2600,42014,0,nb={32,32},myid={0,1},process_grid={2,2}) 1.42s CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:1
...
ilp64 MKL_VERBOSE=1 output:
...
MKL_VERBOSE PZHEEVR(V,I,L,3001,0x5e7a9a00010,1,1,0x441720,0x7ffdff01f5a0,0x7ffdff01f598,1,10,10,10,0x13b9630,0x1370b60,1,1,0x441770,0x5e8c5c75010,100058,0x13bf400,84030,0x1463600,42014,0,nb={3001,0},myid={0,1},process_grid={2,2}) 1.43s CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:1
...
It looks to be a product issue. I will discuss within the team and get back to you later.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page