Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.
7261 Discussions

Code stops after a call to BLACS_GRIDEXIT

OP1
Nouveau contributeur III
3 091 Visites

The following code (built with ifx 2025.2.1 on Windows and run with 7 processes) silently crashes after the call to BLACS_GRIDEXIT on line 19 (namely, it does not execute anything past line 19).

The code is linked with the ilp64 MKL libraries for BLACS and ScaLAPACK.

Am i missing something obvious here?

PROGRAM TEST
IMPLICIT NONE (TYPE, EXTERNAL)
EXTERNAL BLACS_GRIDINIT, BLACS_GRIDINFO, BLACS_PINFO, BLACS_GET, BLACS_PCOORD, BLACS_GRIDEXIT
INTEGER(KIND = 8), EXTERNAL :: BLACS_PNUM
INTEGER(KIND = 8) :: ICTXT, MY_ID, N_PROCS, N_PROW, N_PCOL
INTEGER(KIND = 8) :: MY_ROW, MY_COL

CALL BLACS_PINFO(MY_ID, N_PROCS)
N_PROW = INT(SQRT(REAL(N_PROCS)))
N_PCOL = N_PROCS/N_PROW

CALL BLACS_GET(-1_8, 0_8, ICTXT)
CALL BLACS_GRIDINIT(ICTXT, 'C', N_PROW, N_PCOL)
CALL BLACS_GRIDINFO(ICTXT, N_PROW, N_PCOL, MY_ROW, MY_COL)
IF (MY_ID /= 6) THEN
    WRITE(*, *) 'C', MY_ID, BLACS_PNUM(ICTXT, MY_ROW, MY_COL), MY_ROW, MY_COL
END IF

CALL BLACS_GRIDEXIT(ICTXT)

CALL BLACS_GET(-1_8, 0_8, ICTXT)
CALL BLACS_GRIDINIT(ICTXT, 'R', N_PROW, N_PCOL)
CALL BLACS_GRIDINFO(ICTXT, N_PROW, N_PCOL, MY_ROW, MY_COL)
IF (MY_ID /= 6) THEN
    WRITE(*, *) 'R', MY_ID, BLACS_PNUM(ICTXT, MY_ROW, MY_COL), MY_ROW, MY_COL
    CALL BLACS_GRIDEXIT(ICTXT)
END IF

END PROGRAM TEST

 

0 Compliments
11 Réponses
OP1
Nouveau contributeur III
3 090 Visites

Line 26 in the code above should be removed (i should have done so before posting...) but this does not modify the outcome here.

0 Compliments
OP1
Nouveau contributeur III
3 089 Visites

Also, when did we lose the ability to edit posts?

0 Compliments
Aleksandra_K
Modérateur
1 928 Visites

Hi,


Does the same problem happen when running for 6 or 8 processes?


You create a grid of size 2×3 (6 positions), but you're running with 7 processes. Process number 6 has no valid grid position, which can cause the MPI to abort. 


Regards,

Alex


0 Compliments
OP1
Nouveau contributeur III
1 905 Visites

I am afraid that I do not understand your reply at all. Are you saying that it is impossible to run an arbitrary number of processes for a program relying on BLACS, and that only the number of processes that match exactly the number of processes in a BLACS process grid is allowed? This is not how BLACS work!

I simplified the code a bit further:

PROGRAM TEST
IMPLICIT NONE (TYPE, EXTERNAL)
EXTERNAL BLACS_GRIDINIT, BLACS_PINFO, BLACS_GET, BLACS_GRIDEXIT
INTEGER(KIND = 8) :: ICTXT, MY_ID, N_PROCS, N_PROW, N_PCOL

CALL BLACS_PINFO(MY_ID, N_PROCS)
N_PROW = INT(SQRT(REAL(N_PROCS)))
N_PCOL = N_PROCS/N_PROW
CALL BLACS_GET(-1_8, 0_8, ICTXT)
CALL BLACS_GRIDINIT(ICTXT, 'C', N_PROW, N_PCOL)
CALL BLACS_GRIDEXIT(ICTXT)

WRITE(*, *) MY_ID

END PROGRAM TEST

Running the code with 7 processes, the output is:  

 1
 0
 3
 5
 2
 4
Press any key to continue . . .

Why isn't process 7 printing anything here? 

0 Compliments
OP1
Nouveau contributeur III
1 904 Visites

[ignoring the fact that I should have written "process 6" in my latest message]

In fact, when I run this simplified example consecutively, multiple times, sometimes I get no output at all, sometimes only a subset of the processes 0... 5 print something. There is a randomness to it. Can you try to repeat this on your side?

0 Compliments
Aleksandra_K
Modérateur
1 883 Visites

Could you share how exactly you are running the code? So that I could precisely reproduce your issue.


0 Compliments
OP1
Nouveau contributeur III
1 864 Visites

Here is the BuildLog.htm file that is produced when building the last example above.

 

 

0 Compliments
Aleksandra_K
Modérateur
1 541 Visites

Hi, 


We investigated your issue and confirmed that the problem is in the gridexit call. The context of rank 6 is set to -1 during blacs_gridinit -> blacs_gridmap, which causes the error when gridexit is called. You were right that it is fine to use only a subset of processes for computation. Nevertheless, it is not a bug, as this behavior of gridexit is consistent with the reference BLACS implementation (SCALAPACK: blacs_gridexit_).


Regards, 

Alex


0 Compliments
Aleksandra_K
Modérateur
1 381 Visites

Hi,

Do you have any further questions on the topic?


0 Compliments
Aleksandra_K
Modérateur
1 252 Visites

Hi,


I hope that you found the above explanation useful. We'll monitor this thread for another 3 days for any follow-up questions. If there's no response within that time, this thread will no longer be actively supported by Intel.


Regards,

Alex


0 Compliments
Aleksandra_K
Modérateur
1 039 Visites

With no response from you, this issue will no longer be tracked by Intel. If you need any additional information, please post a new question, ideally in a new thread,


Regards,

Alex


0 Compliments
Répondre