findloc() delivers wrong results on logicals after mpi_allreduce()

Balint_Aradi · ‎11-20-2022

Dear Intel Team,

the findloc() implementation in the recent ifort compiler seems to be fragile when applied to logicals, which were obtained through an mpi_allreduce() call.

The working example below demonstrates that. A 1D array of logicals filled up via an allreduce() call. I would expect subsequent findloc(1-D-array, value=.true.) and findloc(1-D-array, value=.true., dim=1) calls to return the same location (the first one as a 1D array with one element, the second one as an integer), but this is not the case: the returned locations differ (the 2nd one being wrong). Interestingly, if the `.not.` operator is not applied (and search for the location of `.false.`), the results are consistent and right.

The program below had been compiled with mpiifort (mpiifort for the Intel(R) MPI Library 2021.7 for Linux*, ifort version 2021.7.1) on x86_64/Linux and run with "mpiifort" using two or more processes.

program testprog
  use mpi_f08, only : mpi_init, mpi_finalize, mpi_comm, mpi_comm_size, mpi_comm_rank,&
      & mpi_allreduce, MPI_COMM_WORLD, MPI_IN_PLACE, MPI_LOGICAL, MPI_LAND
  implicit none

  type(mpi_comm) :: comm
  logical, allocatable :: globalcond(:)
  integer :: commsize, rank

  comm = MPI_COMM_WORLD

  call mpi_init()
  call mpi_comm_size(comm, commsize)
  call mpi_comm_rank(comm, rank)

  allocate(globalcond(commsize), source=.true.)
  globalcond(2) = .false.

  call mpi_allreduce(MPI_IN_PLACE, globalcond, size(globalcond), MPI_LOGICAL, MPI_LAND, comm)
  ! expect globlacond = [.true., .false., .true., .true., ...] on every process

  globalcond(:) = .not. globalcond
  ! expect globalcond = [.false., .true., .false., false., ...] on every process

  print *, rank, 1, "| ", "globalcond:", globalcond
  print *, rank, 2, "| ", "globalcond:", findloc(globalcond, value=.true.)  ! returns 2
  print *, rank, 3, "| ", "globalcond:", findloc(globalcond, value=.true., dim=1)  ! returns 1
  ! expect to obtain the same position (2) in both findloc() calls (once as 1D array with one
  ! element, one as integer). however, the second invokation returns 1 as result

  call mpi_finalize()

end program testprog

Barbara_P_Intel · ‎11-23-2022

This is one of those "Is this a Fortran issue or an MPI issue?"

Please post the commands you used to compile and run.

Balint_Aradi · ‎11-23-2022

I've used

mpiifort test.f90

to compile and

mpirun -n 2 ./a.out

to run and obtained

           0           1 | globalcond: F T
           0           2 | globalcond:           2
           0           3 | globalcond:           1
           1           1 | globalcond: F T
           1           2 | globalcond:           2
           1           3 | globalcond:           1

Whether it is an MPI-framework problem or not, is hard to tell from outside and my observations are somewhat mixed in this regard:

- if you print the logical array after mpi_allreduce(), it prints the right values,

- if you invoke findloc() without the optional dim argument, it returns the right value: [1]-shaped array with 2 as only element,

- if you invoke findloc() with the optional dim argument, it consistently returns 1, which is wrong,

- however, if you leave away the .not. operator in line 22 and search for value=.false. in lines 26 & 27, you get the correct results with both kind of findloc() invokations.

So probably, findloc(..., dim=1) assumes a certain internal representation of the logicals, which are not fully fullfilled when obtaining them with mpi_allreduce(). But apparently, the other logical operations do not rely on those representational details. (But this is just guessing based on the observations....)

jimdempseyatthecove · ‎11-24-2022

>> findloc(..., dim=1) assumes a certain internal representation of the logicals

What do you see when you examine the contents of the array in the memory window? IOW what are the binary values?

Should findloc of logicals with dim=1 use binary comparison as opposed to logical, then there could potentially be an issue. Fortran logicals use only the lsb to determine .true. or .false..

Jim Dempsey

Steve_Lionel · ‎11-24-2022

@jimdempseyatthecove wrote:

>> findloc(..., dim=1) assumes a certain internal representation of the logicals

Fortran logicals use only the lsb to determine .true. or .false..

Fortran, the language, makes no representation about the representation of logical values, only that there are two of them. (See Doctor Fortran in "To .EQV. or to .NEQV., that is the question", or "It's only LOGICAL" - Doctor Fortran (stevelionel.com)) Intel Fortran is an outlier among current implementations in using the low bit to distinguish true and false, an artifact of its history going back to VAX FORTRAN in the 1970s when there was a VAX instruction to test this and VMS system statuses also used the low bit for success/failure.

It was only with the addition of C interoperability with Fortran 2003 that the standard implied (but did not say outright) that C bool was interoperable with the LOGICAL type. Fortran compilers started in the 1980s or later tended to adopt the C interpretation of zero/nonzero for false/true. Intel Fortran will do that if you specify -standard-semantics or -fpscomp:logicals.

Balint_Aradi · ‎11-24-2022

Unfortunately, -standard-semantics seems to have problem when combined with IntelMPI. For the code below, I get

> mpiifort -standard-semantics test.f90
ld: /tmp/ifortyS7UoN.o: in function `MAIN__':
test.f90:(.text+0x44): undefined reference to `mpi_f08_compile_constants_MP_mpi_comm_world_'
ld: test.f90:(.text+0x40d): undefined reference to `mpi_f08_compile_constants_MP_mpi_land_'
ld: test.f90:(.text+0x43b): undefined reference to `mpi_f08_compile_constants_MP_mpi_logical_'

Balint_Aradi · ‎11-24-2022

Good point! Before the mpi_allreduce(), the byte representation seems to be .false. = 00 00 00 00 and .true. = FF FF FF FF. After the mpi_allreduce(), I obtain .false. = FE FF FF FF and .true. = FF FF FF FF. (At least, this I obtain, if I transfer() each logical into an array of integers and print the hexadecimal representation of those values.) So, it seems, the internal representations are indeed different. But apparently, this does not cause any troubles, unless the dim=1 option is set...

Balint_Aradi · ‎11-24-2022

In case of interest, here is the version which also prints the byte patterns:

program testprog
  use mpi_f08, only : mpi_init, mpi_finalize, mpi_comm, mpi_comm_size, mpi_comm_rank,&
      & mpi_allreduce, MPI_COMM_WORLD, MPI_IN_PLACE, MPI_LOGICAL, MPI_LAND
  implicit none

  type(mpi_comm) :: comm
  logical, allocatable :: globalcond(:)
  integer :: commsize, rank

  comm = MPI_COMM_WORLD

  call mpi_init()
  call mpi_comm_size(comm, commsize)
  call mpi_comm_rank(comm, rank)

  allocate(globalcond(commsize), source=.true.)
  globalcond(2) = .false.

  if (rank == 0) then
    print "(a)", "Before mpi_allreduce()"
    print "(a, *(l10))", "values: ", globalcond
    print "(a, *(z10.8))", "bitrep: ", transfer(globalcond, [1])
  end if

  call mpi_allreduce(MPI_IN_PLACE, globalcond, size(globalcond), MPI_LOGICAL, MPI_LAND, comm)
  ! expect globlacond = [.true., .false., .true., .true., ...] on every process

  if (rank == 0) then
    print "(/, a)", "After mpi_allreduce()"
    print "(a, *(l10))", "values: ", globalcond
    print "(a, *(z10.8))", "bitrep: ", transfer(globalcond, [1])
  end if
  
  globalcond(:) = .not. globalcond
  ! expect globalcond = [.false., .true., .false., false., ...] on every process
  if (rank == 0) then
    print "(/, a)", "After .not. operation"
    print "(a, *(l10))", "values: ", globalcond
    print "(a, *(z10.8))", "bitrep: ", transfer(globalcond, [1])

    print "(/, a, t30, i0)", "findloc(...):", findloc(globalcond, value=.true.)  ! -> 2
    print "(a, t30, i0)", "findloc(..., dim=1):", findloc(globalcond, value=.true., dim=1)  ! -> 1
    ! expect to obtain the same position (2) in both findloc() calls (once as 1D array with one
    ! element, one as integer). however, the second invokation returns 1 as result
  end if

  call mpi_finalize()

end program testprog

which compiled and run with

mpiifort test.f90
mpirun -n 2 ./a.out

produces the output

Before mpi_allreduce()
values:          T         F
bitrep:   FFFFFFFF  00000000

After mpi_allreduce()
values:          T         F
bitrep:   00000001  00000000

After .not. operation
values:          F         T
bitrep:   FFFFFFFE  FFFFFFFF

findloc(...):                2
findloc(..., dim=1):         1

jimdempseyatthecove · ‎11-24-2022

>>unless the dim=1 option is set...

Good. Then this is indicative of using dim=1 is using integer compare as opposed to logical compare.

Per description of findloc:

...

If both array and value are of type logical, the comparison is performed with the .EQV. operator; otherwise, the comparison is performed with the == operator. If the value of the comparison is true, that element of array matches value.

...

Looks like this is not the case.

Until a fix is made, you will need to write a work around such as repairing the values in globalcond. For example:

program Console10
    implicit none
    type t
    union
        map
            logical :: globalcon(5)
        end map
        map
            integer :: globalcon_i(5)
        end map
    end union
    end type t
    type(t) :: g
    
    g%globalcon = .false.
    g%globalcon(2) = .true.
    print *,findloc(g%globalcon, value=.true.), findloc(g%globalcon, value=.true., dim=1)
    g%globalcon_i(2) = g%globalcon_i(2) + 2
    print *,findloc(g%globalcon, value=.true.), findloc(g%globalcon, value=.true., dim=1)
    g%globalcon_i = -iand(g%globalcon_i,1)
    print *,findloc(g%globalcon, value=.true.), findloc(g%globalcon, value=.true., dim=1)
end program Console10
---------
           2           2
           0           2
           2           2

Jim Dempsey

Balint_Aradi · ‎11-24-2022

Thanks. I have never seen union and map in Fortran so far, is are they Intel extensions?

My (hopefully standard conforming) solution was to use merge to map logicals to integers and the == operator to map back (as in lines 20 and 22):

program testprog
  use mpi_f08, only : mpi_init, mpi_finalize, mpi_comm, mpi_comm_size, mpi_comm_rank,&
      & mpi_allreduce, MPI_COMM_WORLD, MPI_IN_PLACE, MPI_INTEGER, MPI_PROD
  implicit none

  type(mpi_comm) :: comm
  logical, allocatable :: globalcond(:)
  integer, allocatable :: globalcondint(:)
  integer :: commsize, rank

  comm = MPI_COMM_WORLD

  call mpi_init()
  call mpi_comm_size(comm, commsize)
  call mpi_comm_rank(comm, rank)

  allocate(globalcond(commsize), source=.true.)
  globalcond(2) = .false.

  globalcondint = merge(1, 0, globalcond)
  call mpi_allreduce(MPI_IN_PLACE, globalcondint, size(globalcondint), MPI_INTEGER, MPI_PROD, comm)
  globalcond(:) = globalcondint == 1

  globalcond(:) = .not. globalcond
  if (rank == 0) then
    print "(/, a, t30, i0)", "findloc(...):", findloc(globalcond, value=.true.)
    print "(a, t30, i0)", "findloc(..., dim=1):", findloc(globalcond, value=.true., dim=1)
  end if

  call mpi_finalize()
  
end program testprog

jimdempseyatthecove · ‎11-25-2022

What about using your (failing) original code, and follow the mpi_allreduce with:

globalcond(:) =merge(.true., .false., globalcond) ! cleanup the return values

Jim Dempsey

Balint_Aradi · ‎11-25-2022

Thanks, your solution is indeed shorter. However, after having burned myself with the combination ifort + impi + logicals, I find it safer to avoid mpi calls with logical arguments completely, who knows what else would fail due to the mismatching internal representation. So I think, I'll stick with the integer remapping as long as the bug is not fixed. Thanks again for your help!