ifx 2024.1 -check all problem with MemorySanitizer

JFH · ‎05-28-2024

The two programs below differ only in call nan_inf being before the print
statement in initial5.90 but after it in initial6.f90. Both compile with
ifx 2024.1; initial5.f90 runs happily and did what I had hoped for:
```
! Fortran 2003 free source form program initial5.f90 by J F Harper
implicit none
integer,parameter:: k1 = kind(1e0), maxk = 3
real(k1) :: h1 = huge(1.0_k1), ni1(2)
integer,parameter:: k2 = kind(1d0)
real(k2) :: h2 = huge(1.0_k2), ni2(2)
integer,parameter:: s3 = selected_real_kind(precision(h2)+1)
integer,parameter:: k3 = merge(k2,s3,s3<0)
real(k3) :: h3 = huge(1.0_k3), ni3(2)
character(24) :: cni = 'NAN INF NAN INF NAN INF '
call nan_inf
print "(A,3(1X,I0))",'Real kinds printed after nan_inf called:',k1,k2,k3
contains
subroutine nan_inf
integer :: i
logical :: nanOK(maxk,2)
read(cni,*) ni1, ni2, ni3
nanOK(:,1) = [ ni1(1)/=ni1(1),ni2(1)/=ni2(1),ni3(1)/=ni3(1) ]
nanOK(:,2) = [ ni1(2) > h1 ,ni2(2) > h2 ,ni3(2) > h3 ]
write(*,"(A,3L2)") ' NaN/=NaN? ',nanOK(:,1),' Inf>huge? ',nanOK(:,2)
end subroutine nan_inf
end program
```
Its result at compile time and run time:
```
(lf) john:~/Test$ /opt/intel/oneapi/2024.1/bin/ifx -standard-semantics -g -check all initial5.f90
ifx: remark #10440: Note that use of a debug option without any optimization-level option will turnoff most compiler optimizations similar to use of '-O0'
(lf) john:~/Test$ ./a.out
NaN/=NaN? T T T
Inf>huge? T T T
Real kinds printed after nan_inf called: 4 8 16
(lf) john:~/Test$
```
But initial6.f90 is:
```
! Fortran 2003 free source form program initial6.f90 by J F Harper
implicit none
integer,parameter:: k1 = kind(1e0), maxk = 3
real(k1) :: h1 = huge(1.0_k1), ni1(2)
integer,parameter:: k2 = kind(1d0)
real(k2) :: h2 = huge(1.0_k2), ni2(2)
integer,parameter:: s3 = selected_real_kind(precision(h2)+1)
integer,parameter:: k3 = merge(k2,s3,s3<0)
real(k3) :: h3 = huge(1.0_k3), ni3(2)
character(24) :: cni = 'NAN INF NAN INF NAN INF '
print "(A,3(1X,I0))",'Real kinds printed before nan_inf called:',k1,k2,k3
call nan_inf
contains
subroutine nan_inf
integer :: i
logical :: nanOK(maxk,2)
read(cni,*) ni1, ni2, ni3
nanOK(:,1) = [ ni1(1)/=ni1(1),ni2(1)/=ni2(1),ni3(1)/=ni3(1) ]
nanOK(:,2) = [ ni1(2) > h1 ,ni2(2) > h2 ,ni3(2) > h3 ]
write(*,"(A,3L2)") ' NaN/=NaN? ',nanOK(:,1),' Inf>huge? ',nanOK(:,2)
end subroutine nan_inf
end program
```

Its run-time result with the same compiler and options is
```
==7598==WARNING: MemorySanitizer: use-of-uninitialized-value
#0 0x5eea0e in __gtq (/home/john/Test/a.out+0x5eea0e) (BuildId: 1808193a9623ce56a7b482e376a24d4329752213)
#1 0x40ba3a in __for_ieee_signaling_gt_k16_ (/home/john/Test/a.out+0x40ba3a) (BuildId: 1808193a9623ce56a7b482e376a24d4329752213)
#2 0x492638 in _unnamed_main$$_IP_nan_inf_ /home/john/Test/initial6.f90:19:57
#3 0x491483 in MAIN__ /home/john/Test/initial6.f90:12:8
#4 0x40d258 in main (/home/john/Test/a.out+0x40d258) (BuildId: 1808193a9623ce56a7b482e376a24d4329752213)
#5 0x7fcd88c29d8f in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
#6 0x7fcd88c29e3f in __libc_start_main csu/../csu/libc-start.c:392:3
#7 0x40d124 in _start (/home/john/Test/a.out+0x40d124) (BuildId: 1808193a9623ce56a7b482e376a24d4329752213)

Uninitialized value was created by an allocation of 'mech_info' in the stack frame
#0 0x4d00ff in for_write_seq_fmt_xmit (/home/john/Test/a.out+0x4d00ff) (BuildId: 1808193a9623ce56a7b482e376a24d4329752213)

SUMMARY: MemorySanitizer: use-of-uninitialized-value (/home/john/Test/a.out+0x5eea0e) (BuildId: 1808193a9623ce56a7b482e376a24d4329752213) in __gtq
Exiting
```
I suspect that ifx 2024.1 has a bug when -check all is used. Either initial5.f90 should have given the MemorySanitizer message or initial6.f90 should not, given that the subroutine called is identical in the two programs. I was using a Linux x86_64 system with Ubuntu 22.04.

Mark_Lewy · ‎05-31-2024

Looks like a bug in the memory sanitiser version of the Intel C/C++ __gtq to me:

.text.__gtq 0x00000000005ee600 0x431 /opt/intel/oneapi/compiler/2024.1/lib/libirc_msan.a(qcomp.c.o)
0x00000000005ee600 __gtq

Presumably, qcomp.c contains code to handle quad precision reals.

Ron_Green · ‎05-31-2024

it is a curious example for sure. There is a bug in the FRTL, just trying to isolate it in more detail. It's strange that initial5 does not show the issue, that's why I'm not so sure about it being in the function for greater than, quad.

JFH · ‎05-31-2024

Both test programs initial5.f90 and initial6.f90 contain a superfluous declaration in the subroutine,

```

integer :: i

```

because the variable i is not evaluated or used. After I removed that line, each program did what it did before: one ran with no error message, the other gave a MemorySanitizer warning though its warning began with a different 4-digit code:

```

==6230==WARNING: MemorySanitizer: use-of-uninitialized-value

```

Ron_Green · ‎06-03-2024

This has proven to be a most unusual case and a fascinating investigation! It's one of those odd, rare issues where a specific combination of language features and specific compiler options interplay. In this case, it takes the compiler into a corner of the Intel Fortran Runtime Library that is rarely used.

First, here is the minimal case of a reproducer. What is so odd, and something you noticed, is that if you attempt to comment out, recode, or modify just about anything in this example you will NOT trigger the uninitialized data use. First the code then the options needed for the trigger

implicit none
real(8) :: h2 = huge(1.0_8)
real(8) :: ni2(2)

real(16) :: h3 = huge(1.0_16)
real(16) :: ni3(2)

character(16) :: cni = 'NAN INF NAN INF '

!...error needs this print statement, removing removes the error
print "(A,3(1X,I0))",'Real kinds printed before nan_inf called:',8,16
call nan_inf

contains

subroutine nan_inf
logical :: nanOK(3,2)
logical ::  nanOK_quad

! no error if this read is commented out
read(cni,*)  ni2, ni3
! read(cni,*) ni3 !...for example, just reading ni3 works.  weird

! errror needs both array constructors
nanOK(:,1) =  [ .TRUE., .TRUE., .TRUE. ]
nanOK(:,2) = [ .TRUE.,.TRUE.,.TRUE.]

nanOK_quad = (ni3(2) > h3)
!nanOK_quad = .TRUE. !error needs the > comparison above

write(*,"(A,L2)")  'Inf>huge? ',nanOK_quad
end subroutine nan_inf
end program

Interestingly, if you comment out the writes, no error. If you change the ni3 from an array to a scalar, no error. If you read into JUST ni3, no error. If you change the logical array constructors to simple assignments, no error. It is very very odd.

Now, for the options needed along with this weirdo reproducer:

ifx -O0 -g -check uninit -o uninit trigger1.f90  -assume ieee_compares -assume recursion

You may note that you used '-standard-semantics' which is needed for the error. Without it, no error. Standard semantics has about a dozen -assumes keywords that it sets. Peeling these back, it was not just 1 keyword! It requires EXACTLY the 2 assumes above, ieee_compares and recursion. ieee_compares makes total sense - the quad compare has to be done in software as there is no processor support for quad. the IEEE aspect of this forces the compiler to use

for_ieee_signaling_gt_k16

which you see in the stack trace for the uninit. Otherwise it uses a non-signalling gt_k16 which has no issue. SO far so good. But in addition to this option, you need RECURSION enabled. This changes allocation strategies which seem to be at the root of the problem. allocations change address references to the variables amongst other things.
SO

IEEE_COMPARES alone is not a problem. But add RECURSION to it and viola!

along with all the other language details noted, which affect addressing and stack variable allocations.

You can test this yourself, change -assume ieee_compares to assume noieee_compares. no error. This again changes the FRTL call from for_ieee_signaling_gt to another gt routine. And you can set -assume norecursion and similarly although we call for_ieee_signaling_gt the addressing for things has changed and it takes a new codepath.

I have written a bug report on this. For the meantime you can simply not use -standard-semantics. Here is a list of all the sub-options it sets. I tested all in isolation, no error. Or you can use all of these but set ieee compares and recursion to 'no'.

-assume byterecl \
-assume failed_images \
-assume fpe_summary \
-assume ieee_compares \
-assume minus0 \
-assume noold_e0g0_format \
-assume noold_inquire_recl \
-assume noold_ldout_format \
-assume noold_ldout_zero \
-assume noold_maxminloc \
-assume noold_unit_star \
-assume noold_xor \
-assume protect_parens \
-assume realloc_lhs \
-assume recursion \
-assume std_intent_in \
-assume std_minus0_rounding \
-assume std_mod_proc_name \
-assume std_value

Ron_Green · ‎06-03-2024

bug ID is CMPLRLIBS-34978

JFH · ‎06-10-2024

FWIW I hit the bug with a slightly shorter program: h2 and h3 were not needed though ```huge(1.0_16)``` still was. Here is the shorter version:

```

implicit none
real(8) :: ni2(2)
real(16) :: ni3(2)
character(16) :: cni = 'NAN INF NAN INF '
!...error needs this print statement, removing removes the error
print "(A,3(1X,I0))",'Real kinds printed before nan_inf called:',8,16
call nan_inf

contains

subroutine nan_inf
logical :: nanOK(3,2)
logical :: nanOK_quad
! error needs both array constructors
nanOK(:,1) = [ .TRUE., .TRUE., .TRUE. ]
nanOK(:,2) = nanOK(:,1)
! no error if this read is commented out
read(cni,*) ni2, ni3
! read(cni,*) ni3 !...for example, just reading ni3 works. weird
nanOK_quad = (ni3(2) > huge(1.0_16))
! error needs the > comparison above
write(*,"(A,L2)") 'Inf>huge? ',nanOK_quad
end subroutine nan_inf
end program

```