Vectorization and segmentation fault

Matthieu_B_ · ‎01-06-2014

I have a problem with an old software that used to work with ifort 11 and does not anymore with recent versions.

This software is written in fortan 77 and uses an old trick to manage its memory. This trick leads to the arrays out of their bounds so don’t be shocked!

The idea is, at the beginning of the execution to allocate a big array with a C malloc and to calculate the distance from this array and a reference array (called refarr in the following). To get data with the good type in the allocated array, an equivalence statement is used.

In this software, a loop always return a segmentation fault when it is vectorized and I don’t get why. Here is a simplified source code :

[fortran]

subroutine mysub(datpos,n)

c datpos = data position in the allocated array

implicit none

integer*8 DIST ! distance of the reference array to the allocated array

integer datpos,n

integer refarr

integer*8 adress_arr(1)

integer anint,i,j,jj(n)

COMMON/MYCOM/DIST,refarr(2)

c with this statement, refarr, adress_arr are at the same adress

equivalence (refarr(1),adress_arr(1))

c some code her

anint = 0

do i=4,n !or anything else here

jj(i)=refarr(adress_arr(DIST+datpos)+1+i)

if(refarr(adress_arr(DIST+jj(i))+3).EQ.4)THEN

anint = 1

ENDIF

enddo

c more code her

return

end

[/fortran]

With ifort11, this loop was not vectorized and it worked. I obviously found the solution to use the NOVECTOR directive but it does not explain the problem. I found another solution : if I declare a jj array and change the loop this way, the software works (I also needed to use the VECTOR ALWAYS directive because the optimizer says that the vectorization of the modified loop seem unefficient) :

[fortran]

do i=4,n !or anything else here

jj(i)=refarr(adress_arr(DIST+datpos)+1+i)

if(refarr(adress_arr(DIST+jj(i))+3).EQ.4)THEN

anint = 1

ENDIF

enddo

[/fortran]

Does anyone have an idea of what is the problem ?

TimP · ‎01-06-2014

I think you've omitted important information.

What are actually the differences between the source which fails under vectorization and the one which works?

I suspect an actual running reproducer may be needed.

What are the compilation options? I have a case which fails sometimes under ifort 14.0.1 when vectorized for SSE4 but is OK when vectorized for SSE2 or AVX.

"recent versions" includes some buggy ones. Please use current updates of 13.1 or 14.0.

I don't know why you call this Fortran 77. integer*8 and implicit none were extensions, not covered in the standard. For integer*8 it's still the case, although it has become more widely available. anint was already a standard intrinsic, which of course is removed from visibility by your local declaration (if the compiler doesn't get confused). By combining such questions, you may get into untested territory.

Matthieu_B_ · ‎01-07-2014

Sorry, the buggy loop is wrong in my first post. It is in fact :

[fortran]
      do i=4,n !or anything else here
        j=refarr(adress_arr(DIST+datpos)+1+i)
        if(refarr(adress_arr(DIST+j)+3).EQ.4)THEN
          anint = 1
        ENDIF
      enddo

[/fortran]

For the anint issue, the original loop uses a "LOOA" name for this integer... I just tried to make it easier to read and did not pay attention to the fact that anint is an intrinsic.

The compilation options are "-O3 -unroll -xSSE2". I use ifort 14.0.1.106

I call it fortran 77 just to precise that sadly, I can't avoid to use those fantastic equivalence statements

Steven_L_Intel1 · ‎01-07-2014

EQUIVALENCE is in Fortran 2008.

jimdempseyatthecove · ‎01-07-2014

address_arr has dimension of 1, refarr has a dimension of 2, address_arr(1) and refarr(1) are equivalence in COMMON/MYCOM/

Therefore, your only assurance is that there is sufficient memory for two integer*8's starting at the (same) address as the (1)'th index of each array.

ergo:

n must be .LE. 5 (not "anything else here")
Technically DIST+datapose (when n is 4 or 5) must == 1, although ==2 also assures a valid address (due to refarr having dimension of 2)
The contents of address_arr(1:2) must be -4:-3 when n==4, or, -4 when n==5 (validity for j= line)
j must be -4:-3 when n==4, or, -4 when n==5
DIST must be 3 when n==4, or,5 when n==5
if( statement requires DIST+j to be 1 or 2, therefore n cannot be 4 (n must be 5)
With requirement of n==5 i has values 4,5
address_arr(1:2) must be [-4,-4]

if statement becomes if(refarr(-4+3).EQ.4)

which is an invalid index of refarr to index(-1)

GIGO

IOW Lack of crash .OR. desired vectorization is not a proof of valid code.

I agree with TimP that a proper example of code may help to clear things up.

Jim Dempsey

jimdempseyatthecove · ‎01-07-2014

Note, the segmentation fault is likely a result of the COMMON/MYCOM/ segment being located at the start of a virtual memory page boundary and then a subsequent reference to refarr(-1) addressing non-existent memory causing a page fault. Had /MYCOM/ not been located at the start of page could possibly result in no page fault for invalidly indexed refarr(-1). IOW GIGO would not crash. Though you continue to run with GO (garbage out).

Jim Dempsey

Matthieu_B_ · ‎01-08-2014

Reading at the answer, i realize that I indeed should provide a proper example of code... I'll come back in 1 day or 2 with it !

Matthieu_B_ · ‎01-27-2014

Sorry for the delay but I finally managed to reproduce the problem with a simple program.

As I told in the previous posts my program uses some tricks to add a kind of a dynamic allocation in fortran77. The idea is to perform a big C malloc and to calculate the distance from the allocated array to a reference array.

In this software, a loop works correctly when it is not vectorized but returns a segmentation fault when it is and I can not understand why.

The main program I used to reproduce the bug is simple: It’s 3 call, the first to the C memory allocation (initmem), the second to some Fortran subroutine used to initialize the content of the allocated array (setmem) and the last one contains the buggy loop (lxcall):

[fortran]

      program test
      implicit none

c The 3 following statements are to store the position of the allocated data
c distance of the reference array to the allocated array
      integer*8 DIST
c reference array
      integer   refarr
c      with this statement, refarr, adress_arr are at the same adress
      COMMON/MYCOM/DIST,refarr(2)
c number of integers allocated
      integer*8 size
c     Some integers used to reproduce the bug
      integer tvcal
c     Initialise the dynamic memory
      size=10000000
      call init_mem(size,refarr,DIST)
c     To be able to use more than 3Go memory, adresses are INTEGRE*8 based
      DIST = DIST / 2
      tvcal = 1
      call setmem(tvcal)
      call lxcall(tvcal)
      end

[/fortran]

The C initmem subroutine is ther :

[cpp]
/*Initialisation of the memory. Allocation of a size*sizeof(int) array
and calculation of the distance (in number of integer from *ref to
the allocated array */
void init_mem_(long*size, int*ref, long*dist){
int*allocated_array ;
allocated_array = (int*) malloc(*size * sizeof(int)) ;
/*Calculation of the distance between allocated array and ref*/
*dist = allocated_array - ref ;
printf("Calculated distance %ld \n",*dist) ;
}

[/cpp]

The 2 other subroutines trigger the spam filter so... I attach them to this post.

The setmem subroutine is used to initialize the test. The lxcall contains the buggy loop. I dupplicated this loop and used the NOVECTOR directive to inhibate the vectorization.

The makefile is also in atachement.

When I use the test, I get the following prints :

"Calculated distance 11743860906594
I am here
I can come here
forrtl: severe (174): SIGSEGV, segmentation fault occurred"

Showing that when the loop is not vectorized, the source code works and when it is vectorized, it works.

Martyn_C_Intel · ‎01-28-2014

I presume this is intended for IA-32 not Intel64, since your code doesn't look safe for 64 bit pointers? But it seemed to work for me on IA-32 with the 14.0.1.106 compiler, without obvious errors.

$ icc -c -O3 -unroll -xsse2 init_mem.c

$ ifort -O3 -unroll -xsse2 -X -static -vec-report2 main.f init_mem.o lxcall.f setmem.f
lxcall.f(25): (col. 7) remark: loop was not vectorized: #pragma novector used
lxcall.f(35): (col. 7) remark: LOOP WAS VECTORIZED
$ ./a.out
Calculated distance -347899472 -1256005624 135592264 (I added allocated_array and ref to the printf)
3947067824
I am here
I can come here
But here not
$

As I'm sure you're aware, there are much easier ways to do dynamic memory allocation in modern Fortran.

jimdempseyatthecove · ‎01-28-2014

If you are compiling on Intel64 as 64-bit application, then your C helper might require changing "long*" to "intptr_t*". This will assure that the sizeof the argument pointed to is the size of a pointer on the bitness of the compiled code.

Many C compilers use 32-bit long. Same issue with the "int*", this should be "intptr_t*".

Jim Dempsey

Matthieu_B_ · ‎01-29-2014

I do realize that there are much better way to dynamically allocate memory but... I have something like 1 000 000 lines of legacy codes relying on this memory trick and people won't let me rewrite them !

Your posts gave me some ideas. I still do not have the solution but I think I'm closer to it !

I decide to get rid of the C part and to investigate on the behaviour depending on the memory adress. My main becomes :

[fortran]

      program test
      implicit none

c The 3 following statements are to store the position of the allocated data
c
c distance of the reference array to the allocated array
      integer*8 DIST , DIST2 , DIST3
      integer*8 size
c reference array
      integer   refarr

c      with this statement, refarr, adress_arr are at the same adress
      COMMON/MYCOM/DIST,refarr(2)
c
c     Arrays to use
c
c     4 300 000 000 > 2^32 and 4 200 000 000 < 2^32
      integer   alldata2(4 300 000 000)
      integer   alldata3( 100 000 000)

c number of integers allocated
c
c     Some integers used to reproduce the bug
      integer tvcal


c     Calculation of the data distances
      DIST2 = LOC(alldata2) - LOC(refarr)
      DIST3 = LOC(alldata3) - LOC(refarr)
c
c     To be able to use more than 3Go memory, adresses are INTEGRE*8 based
      DIST2 = DIST2 / 8
      DIST3 = DIST3 / 8
      write(*,*)'Distance',DIST2,DIST3
c
c     Chose the array to use changing the DIST value
      DIST = DIST3
c
      tvcal = 1
      call setmem(tvcal)
      call lxcall(tvcal)

end

[/fortran]

To make this work I have to use -mcmodel=large option but except for that the rest is the same

Changing the size of "alldata2", I can change the position in memory of "alldata3". When alldata2 size is over 2^32, the code fails if it is below 2^32 it works. That leads me to the conclusion that the vectorization must assume that the indexes used in the loop of the lxcall subroutine are integer*4.

To check this, I probably should take a look at the assembly code (and hope to understand it !) but I did not find the options that I can use to get it.

Matthieu_B_ · ‎01-29-2014

I found the options to generate the assembly code and I think I have found the buggy part. The whole assembly code is in attachement.

The problem comes indeed from the fact that the indexes of the array are calculated on 32bits integer. Indeed, I find in the assembly code in attachement :

movslq %r9d, %rbx #line 233

movq mycom_(,%rbx,8), %xmm4 # line 251

The problem is that considering the position of the data in the memory, the %rbx value should be greater than 2^32 but we see that it can not be as it was calculated in %r9d which is a 32bits register.

I added a print of rbx content before the segfault and saw that the value was "-2144967028" which corresponds to the first (DIST+tvcal) - 2^32

And now i'm wondering : is this a compiler bug or should I use some additional options with ifort to take into account that the index that I use can be greater than 2^32?

Martyn_C_Intel · ‎01-29-2014

Thanks for the new test case and analysis.

This looks to me like a bug. I have been able to construct a small test case that reproduces a similar problem, without using any fancy addressing tricks or large, negative offsets or indices. The problem seems to be that when the index calculation involves a mixture of integer*4 and integer*8, the compiler doesn't realize that it needs to perform the calculation in 64 bits. If I declare all the integers in the index calculation to be integer*8, then both my small test case and your example seem to work. (Since 2**32 x 8 bytes = 32 GB, you need a system with substantially more memory than this).

So try integer*8 lcal,looa,ical,pv,next in lxcall

I was able to get away without declaring tvcal as integer*8, but you'd probably want to do that too, or copy it into an integer*8 local, to be safe. I'll submit this to the compiler developers, and if they agree that it's a bug, we'll get it fixed. But I hope that in the meantime, you can use the above workaround. It's probably not a bad general precaution to use integer*8 for any integer that might be used as part of an address calculation for a program that needs 64 bit addressing.

There's of course no reason to force vectorization of the loop in your example. But I suppose the real application has loops with this construct that also have plenty of additional work that makes vectorization worthwhile.

FWIW, long ago in a previous profession, I used to work with large applications that used exactly this style of memory allocation and management. The good side is that all data was always in scope. The bad side is that bounds checking is close to impossible, and it's very hard to debug when one piece of data overwrites another, especially since different data types can be all mixed together. I don't miss it.

jimdempseyatthecove · ‎01-30-2014

Martyn,

Can you undo your integer*8 edit (revert to buggy code) and then add "_8" to the array declarations

integer*8 adress_arr(1_8)
...
COMMON/MYCOM/DIST,refarr(2_8)

Does the index calculation misbehave?

FWIW

Assume a program were written using the "modern way" any they used an allocatable array.
If the index error were induced by mixing integer*8 and integer*4 in an expression, I would expect this error to appear here too. Example:

Array(BigIndex+1)

Where BigIndex is integer(8), and the 1 is by default integer(4).

Jim Dempsey

Martyn_C_Intel · ‎01-30-2014

Hi Jim,

I already tried adding _8 to all integer constants, and it didn't help. This isn't a general problem of combining 32 bit and 64 bit integers, or it would have been seen long ago. I also don't think there's a problem in adding literal constants to an integer*8 variable, even though, as you imply, literals default to integer*4 in Fortran. The issue also seems specific to the vectorizer. I think I would expect allocatable arrays to behave in the same way as static arrays, though I haven't tested, since I don't think you can equivalence allocatable arrays. The context isn't quite the same, though, since you don't need -mcmodel=medium if the only large arrays are dynamically allocated.

Feel free to test any combination that you suspect. I'm attaching my smaller test code, in case that is useful. It does, though, require >>32GB (=2**32 x 8 bytes) in order to execute.

Program test_loop64
integer*4, parameter :: N=1000
integer*8, parameter :: two32 = 2**32
integer*4 i, pv, looa    ! works if integer*8
integer*8 DIST
integer*4 refarr (two32+N)
integer*8 adress_arr(two32+N)

COMMON/MYCOM/DIST, address_arr
equivalence (refarr(1),adress_arr(1))

refarr    (      1:      N) = (/(i,i=1,N)/)
refarr    (two32+1:two32+N) = (/(i,i=1,N)/)
adress_arr(two32+1:two32+N) = (/(i,i=1,N)/)
DIST = two32
looa = 0

!dir$ vector always
do i = 1, N/2
    pv = refarr( adress_arr(DIST + i))
    if(refarr( adress_arr(DIST+pv) + DIST + 3 ).eq.4)looa = 1
enddo
write(*,*) pv, looa

end program test_loop64

$ ifort -O2 -mcmodel medium -traceback -vec-report2 test_loop64.f90; ./a.out
test_loop64.f90(12): (col. 3) remark: LOOP WAS VECTORIZED
test_loop64.f90(13): (col. 3) remark: LOOP WAS VECTORIZED
test_loop64.f90(14): (col. 3) remark: LOOP WAS VECTORIZED
test_loop64.f90(14): (col. 3) remark: loop was not vectorized: vectorization possible but seems inefficient
test_loop64.f90(19): (col. 3) remark: LOOP WAS VECTORIZED
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image              PC                Routine            Line        Source
libintlc.so.5      00002B9CCBA5A229 Unknown               Unknown Unknown
libintlc.so.5      00002B9CCBA58BA0 Unknown               Unknown Unknown
libifcore.so.5     00002B9CCA6ED33F Unknown               Unknown Unknown
libifcore.so.5     00002B9CCA654D7F Unknown               Unknown Unknown
libifcore.so.5     00002B9CCA665F83 Unknown               Unknown Unknown
libpthread.so.0    0000003F75A0F500 Unknown               Unknown Unknown
a.out              0000000000400C2C MAIN__                     21 test_loop64.f90
a.out              0000000000400846 Unknown               Unknown Unknown
libc.so.6          0000003F7521ECDD Unknown               Unknown Unknown
a.out              0000000000400739 Unknown               Unknown Unknown

Matthieu_B_ · ‎05-12-2014

Sorry to ask only now but finally has this been registered as a bug and if yes, could I get the number just to check when it will be fixed.

Thanks

jimdempseyatthecove · ‎05-12-2014

Now that I see your example,

integer*8, parameter :: two32 = 2**32

May be an issue as the result may be 0 as 2_4**32 exceeds the capacity of integer*4

Then refarr(two32 + N) becomes refarr(N), same with address_arr

Jim Dempsey

Martyn_C_Intel · ‎05-12-2014

Matthieu B. wrote:

Sorry to ask only now but finally has this been registered as a bug and if yes, could I get the number just to check when it will be fixed.

Yes, this was registered as a bug in January, internal ID dpd200252841. It has been worked on and the fix is targeted for the next major version of the compiler. Thanks for asking.

Martyn_C_Intel · ‎09-04-2014

Version 15.0 of the Intel Compiler, contained in Intel Parallel Studio XE 2015, has just been released and contains a fix for this issue.

Matthieu_B_ · ‎10-30-2014

I am surprised : I reproduce the exact same issue with the 15.0 at least with the one the IT guy installed. If I use ifort -V I get :

Intel(R) Fortran Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 15.0.0.090 Build 20140723

Martyn_C_Intel · ‎10-30-2014

You are right. The issue was reported internally as fixed in 15.0, and I thought I'd tested an earlier 15.0 compiler, but I can definitely still reproduce the problem now, using the compiler you quote. It is also not fixed in the compiler update which will be coming soon. I've already started following up. I'll post when there's something to report.

I'm sorry about this, but thanks for reporting.