Tracking a Fortran bug at the assembly level

Francois_F_ · ‎06-26-2019

Hi,

I've been working on a bug for weeks that is very difficult to hunt. Finally, I've decided to go at the assembly level to track it down. I am not allowed to share or post the code, but I am quite puzzled by the assembly code. To simplify, the subroutine looks like this:

subroutine anonymized(this, k)
  implicit none
  class(my_type), intent(inout) :: this
  integer, intent(in) :: k

  real(8) :: aux
  integer :: i1, i2
  
  aux = this%something ...
  do i1 = 1, this%n
    do i2 = 1, this%m
      if (this%value(i1) < 1.0e-10) then
        ...

and the code crashes at the first comparison of this%value(i1). The crash is only observable with some flags such as -O2 -heap-arrays 0. If I try to print the value of this%value(i1), just before it is used, the code runs fine to completion and the bug dissapears. Sometimes, when I change the code that is *after* this one, the bug disappears. It just drives me crazy.

So I had a look at the assembly code. The beginning of this code is given here.

Dump of assembler code for function __anonymized:
=> 0x0000000000522970 <+0>:	push   %rbp
   0x0000000000522971 <+1>:	mov    %rsp,%rbp
   0x0000000000522974 <+4>:	push   %r12
   0x0000000000522976 <+6>:	push   %r13
   0x0000000000522978 <+8>:	push   %r14
   0x000000000052297a <+10>:	push   %r15
   0x000000000052297c <+12>:	push   %rbx
   0x000000000052297d <+13>:	sub    $0x148,%rsp
   0x0000000000522984 <+20>:	mov    (%rdi),%rbx
   0x0000000000522987 <+23>:	mov    %rsi,-0x80(%rbp)
   0x000000000052298b <+27>:	mov    %rdi,-0x78(%rbp)
   0x000000000052298f <+31>:	mov    0x79c58(%rbx),%rdx
   0x0000000000522996 <+38>:	neg    %rdx
   0x0000000000522999 <+41>:	movslq 0x7a6a8(%rbx),%rcx
   0x00000000005229a0 <+48>:	add    %rcx,%rdx
   0x00000000005229a3 <+51>:	mov    0x79c18(%rbx),%rax
   0x00000000005229aa <+58>:	movsd  0x7a688(%rbx),%xmm0
   0x00000000005229b2 <+66>:	mov    0x79ba0(%rbx),%r8d
   0x00000000005229b9 <+73>:	mov    %rcx,-0x88(%rbp)
   0x00000000005229c0 <+80>:	mulsd  (%rax,%rdx,8),%xmm0
   0x00000000005229c5 <+85>:	mov    %r8d,-0x48(%rbp)
   0x00000000005229c9 <+89>:	mov    0x79bc0(%rbx),%ecx
   0x00000000005229cf <+95>:	test   %r8d,%r8d
   0x00000000005229d2 <+98>:	jle    0x527bcf <__anonymized+21087>
   0x00000000005229d8 <+104>:	mov    %ecx,%r13d
   0x00000000005229db <+107>:	xor    %r12d,%r12d
   0x00000000005229de <+110>:	and    $0xfffffff8,%r13d
   0x00000000005229e2 <+114>:	pxor   %xmm2,%xmm2
   0x00000000005229e6 <+118>:	movslq -0x48(%rbp),%rax
   0x00000000005229ea <+122>:	pxor   %xmm3,%xmm3
   0x00000000005229ee <+126>:	movslq %r13d,%r10
   0x00000000005229f1 <+129>:	movslq %ecx,%rdx
   0x00000000005229f4 <+132>:	movsd  0x13bb9c(%rip),%xmm1        # 0x65e598
   0x00000000005229fc <+140>:	mov    %rax,-0x40(%rbp)
   0x0000000000522a00 <+144>:	mov    %r10,-0x160(%rbp)
   0x0000000000522a07 <+151>:	mov    %r13d,-0x168(%rbp)
   0x0000000000522a0e <+158>:	mov    %ecx,-0x30(%rbp)
   0x0000000000522a11 <+161>:	cmpl   $0x0,-0x30(%rbp)
   0x0000000000522a15 <+165>:	jle    0x522d10 <__anonymized+928>
   0x0000000000522a1b <+171>:	neg    %r11
   0x0000000000522a1e <+174>:	add    %r12,%r11
   0x0000000000522a21 <+177>:	mov    0x79fd8(%rbx),%rdi
   0x0000000000522a28 <+184>:	mov    0x7a018(%rbx),%r8
   0x0000000000522a2f <+191>:	mov    0x7a260(%rbx),%rsi
   0x0000000000522a36 <+198>:	comisd 0x8(%rdi,%r11,8),%xmm1

The code crashed on comisd. It seems that the jle are not taken (I am a beginner to assembly code). On the comisd line, 0x8(%rdi,%r11,8) is obviously trying to access the array at index r11. I have checkd %rdi which contains the right address. But what is surprising, is that r11 is set to 140737488332700 at the beginning of the function and is only neg at line 0x0000000000522a1b. So it feels to me that the register %r11 is never initialized.

What do you think of that?

Best regards

Juergen_R_R · ‎06-26-2019

Did you check your code with all check options of the compiler on? i.e. with -check=all.

If you open a support ticket with the Intel Support you can send them code confidentially.

Francois_F_ · ‎06-26-2019

Yes, I have already tried -check all, and it does not give anything. Unfortunately, I am not allowed to share any code.

jimdempseyatthecove · ‎06-26-2019

>>The crash is only observable with some flags such as -O2...
>>But what is surprising, is that r11 is ~~set to~~ (edit: has value of) 140737488332700 at the beginning of the function...
correct. r11 is not initialized (set) within the function

This looks like a bug in the compiler. What version of the compiler are you using?
I suspect r11 should have been ecx (rcx), the count remaining, to be subtracted (using neg, add) from total count in r12 to produce the index.

Also, the assembly code does not look like O2 optimized?!? (or poorly optimized)

Jim Dempsey

Juergen_R_R · ‎06-26-2019

Did you check with a different compiler like gfortran? I think with Intel support you can have a non-disclosure agreement.

Francois_F_ · ‎06-26-2019

Hi, and thanks Jim for your hint.

I believe that I have found a bug in the compiler. Here is the code to reproduce the bug. At first, the main.f90

program main
    use my_module
    implicit none

    type(my_type) :: obj
    integer :: i0, i1

    obj%n = 1
    allocate(obj%x1(obj%n))
    allocate(obj%x2(obj%n, obj%n))
    do i0 = 1, obj%n
        obj%x1(i0) = 1.0
        do i1 = 1, obj%n
            obj%x2(i0, i1) = 1.0
        end do
    end do

    call obj%f()
end program main

Now, the module.f90

module my_module
  implicit none
  type, public :: my_type
    integer, public :: n
    real(8), allocatable, public :: x1(:)
    real(8), allocatable :: x2(:,:)
    contains
    procedure :: f
  end type my_type
  contains
  subroutine f(this)
    implicit none
    class(my_type), intent(inout) :: this

    integer :: k0, k1

    do k0 = 1, this%n
        do k1 = 1, this%n
            if (this%x1(k0) < 1.0e-10) then
                this%x2(k0, k1) = 0.0
            else
                this%x2(k0, k1) = this%x2(k0, k1) / this%x1(k0)
            endif
        enddo
    enddo
    return
end subroutine f

end module my_module

When the code is compiler with ifort 19.0.4.243 on Linux, with

ifort -g -O2 module.f90 main.f90 -o main

the program segfaults when running.

Can anyone reproduce the bug on his machine?

Steve_Lionel · ‎06-26-2019

This sure looks like a code generator/optimizer bug to me. I can reproduce it on Windows with just default (O2) optimization.

As best as I can tell, when optimized the compiler is getting confused as to which register to use for the passed-object THIS argument. I encourage you to report this to Intel via http://software.intel.com/sites/support/ (Click Priority Support).

Andriy · ‎07-05-2019

Looking at the assembly code, it feels like advanced vectorization instructions may be involved in the issue.

Try to disable vectorization -no-vec, or to specify different architecture with -arch or other similar compiler option.

Compiler becomes more and more vectorization aware but at the same time loosing in reliability in very simple circumstances.