Segmentation fault only when vectorization is enabled - Page 2

AThar2 · ‎04-07-2019

Part of my code has been vectorized using !$omp simd. Whenver I have the vectorization enabled I get an error saying " array index out of bounds". The code line it points I find quite random, since when I comment out that line the error persists and referring to another line.

In my loop I have a clause which contains which if true calls function A and if false calls function B. (Both functions also have a function call inside them). But all these functions have been inlined and declared simd. The point I want to make is if I comment out one of these function call ( the part of the clause which I KNOW the code won't process at run time because of my flag settings) the segmentation fault is delayed. If I comment out the other function (B - the one that is being called) then following two scenarios happen

1) If I also comment out function A EVEN THOUGH it is not being called , my program runs!

2) If I DON'T comment out function A (EVEN THOUGH IT IS NOT BEING CALLED) my program complains about an "array index out of bounds"

I did have -traceback enabled. But that is completely useless.

I even did write a clause saying if the index gets larger than the array size, then skip that loop (CYCLE). However, I am 100% sure that my array index does not go out of bounds, unless the vectorization is doing something I am not aware about.

I don't know if this is useful

when running with Valgrind

I get numerous errors messages which are quite identical (only when running on the case that the prorgram actually fails)

I first get this error :

==2883== Invalid read of size 8
==2883==    at 0x44C200: lpt_particles_mp_displu_ (in lpt.x)
==2883==    by 0x43AC6E: lpt_marching_mp_unsteady_spray_steady_flow_ (in lpt.x)
==2883==    by 0x41ED25: MAIN__ (in lpt.x)
==2883==    by 0x403761: main (in lpt.x)
==2883==  Address 0xc001077a90872154 is not stack'd, malloc'd or (recently) free'd

Then I get followings erros ( which are probably due to the fact that I have not be deallocated and the program crashed, correct me if I am wrong

==31838== 262,144 bytes in 1 blocks are still reachable in loss record 137 of 137
==31838==    at 0x4C2C1E0: calloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==31838==    by 0xDDF8C9A: ???
==31838==    by 0xDDF707B: ???
==31838==    by 0xDDEF425: ???
==31838==    by 0xDDEF797: ???
==31838==    by 0x5421EDB: fi_endpoint (fi_endpoint.h:156)
==31838==    by 0x5421EDB: ??? (ofi_init.h:1733)
==31838==    by 0x5429F08: MPIDI_NM_mpi_init_hook (ofi_init.h:1117)
==31838==    by 0x5429F08: MPID_Init (ch4_init.h:855)
==31838==    by 0x5429F08: MPIR_Init_thread (initthread.c:647)
==31838==    by 0x541DD1B: PMPI_Init (init.c:284)
==31838==    by 0xC611CFA: MPI_INIT (initf.c:275)
==31838==    by 0x4481EF: lpt_parallel_mp_parallel_init_ (in lpt.x)
==31838==    by 0x41ED0D: MAIN__ (in lpt.x)
==31838==    by 0x403761: main (in lpt.x)

Then I get several of these :

==31838== 28,517,032 bytes in 1 blocks are possibly lost in loss record 137 of 137
==31838==    at 0x4C2A0B0: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==31838==    by 0x53465F: _mm_malloc (in lpt.x)
==31838==    by 0x4B9680: for_alloc_allocatable (in lpt.x)
==31838==    by 0x429CCF: lpt_geom_mp_tri_normals_ (in lpt.x)
==31838==    by 0x436FF6: lpt_init_mp_spray_init_ (in  lpt.x)
==31838==    by 0x466EFC: lpt_preprocessor_mp_preproc_ (in lpt.x)
==31838==    by 0x41ED17: MAIN__ (in lpt.x)
==31838==    by 0x403761: main (in pt.x)

I really know not showing the code makes it much difficult, but it would be insane for me to put the entire code here which is very large. Trying to simplify the problem and yet producing this bug has not been successful yet. It is very difficult to do so when not knowing an entire thing on where the error is.

A suggestion: Could it be that I am exhausting my vectorisation register.

I have tried to compile with -xcore-AVX2 -align array32byte -qopt-zmm-usage=high and AVX512 -align array64byte -qopt-zmm-usage=high.

I would really appreciate if somebody have experienced similar issue or could indicate potential reasons for this error,

Please notice again, I have run this in full debug mode and fully optimised (-O3) but not having vectorisation. Nor did the compiler complain or when running with valgrind. Everything just seemed fine?

AThar2 · ‎04-13-2019

What I tried to say in Quote #20 is that when I have the inline statement as shown in Quote #17 code line no. 07 I get a linking problem saying that function A,B are undefined.

However, if remove your line 07 the compiler manages to link my program. So my question is if that is a typo from your side or that you believe it is correct and I am probably doing something wrong.

Just to say in other words,

I am copying your concept in quote 16 and when I do

! YourSIMDloop.inc
! simd loop
!$OMP SIMD 
do i = 1, N 
   (....) ! other calculations etc.
! *** proc_ptr is either routine_A or routine_B as substituted by FP P
!dir$ attributes forceinline :: proc_ptr   !----- THIS LINE GIVES ME LINKING PROBLEMS
   call proc_ptr(A,B,C)
end if
! end YourSIMDloop.inc

-----

! in your main code
...
if(apply_turb) then
! Compile with PreProcess file
! use FPP #define and #include
#define proc_ptr routine_A
#include "YourSIMDloop.inc"
#undef proc_ptr
else
#define proc_ptr routine_B
#include "YourSIMDloop.inc"
#undef proc_ptr
endif
...

jimdempseyatthecove · ‎04-13-2019

The it may be that FPP did not perform the macro substitution in line 07 due to it being a comment line. Therefor to correct this

add -macro to see if fpp expands macros within Fortran comment

or use:

!dir$ attributes forceinline :: routine_A
!dir$ attributes forceinline :: routine_B

or use

#if ("proc_ptr" == "routine_A")
!dir$ attributes forceinline :: routine_A
#elif  ("proc_ptr" == "routine_B")
!dir$ attributes forceinline :: routine_B
#else
#error "fix this"
#endif

Jim Dempsey

AThar2 · ‎04-13-2019

Thanks Jim,

I tried to debug the reason. So I started out by removing all the macros and explicitly made it use Routine_A

! YourSIMDloop.inc
! simd loop
!$OMP SIMD 
do i = 1, N 

!dir$ attributes forceinline :: ROUTINE_A   
   call ROUTINE_A(A,B,C)
end if
! end YourSIMDloop.inc

-----

The compiler throws an error saying

lpt_displ_loop.inc(29): error #7864: This symbol has multiply declared DEC$ ATTRIBUTES FORCEINLINE attribute.   [ROUTINE_A]
  !dir$ attributes forceinline :: ROUTINE_A
  --------------------------------^

Clearly, I went to check if I had a FORCEINLINE where I define my ROUTINE_A but I DON'T. So it could be my problem is somewhere else and the error message is very confusing, if not wrong.

AThar2 · ‎04-13-2019

Jim, it makes sense that the macro may not expand comments. But adopting your solution (the latter one) still give the compiler error that I am having multiple forceinline declaration of my routine, while I am not.

jimdempseyatthecove · ‎04-14-2019

See your new post for explanation.

Jim

AThar2 · ‎05-07-2019

Hello Jim,

wrt to this post, When I read through this Website I got a bit confused with what you said regarding not having any simd commands within a simd loop.

As you will see in this link, they do have declared simd in a function which is being called within a simd loop.

I am bit confused when I read this.

Also, I know that if I INLINE my functions having !DIR$ attributes vector should not be necessary, right? So, is it correctly understood that the !DIR$ attributes vector directive is necessary when a function is called within a simd loop and is not being inlined?

jimdempseyatthecove · ‎05-10-2019

forrest and trees

The face that simd appears on

#pragma omp simd

and

#pragma omp declare simd

does not make the two equivalent.

The declare format is used on the source code of the declared function to generate one or more "signature" functions out of the source code. Depending on presence or absence of clauses, this may result in having a scalar version, plus a 2-wide version, plus a 4-wide version. These versions then can then be available for use (called from) within a #pragma omp simd section of code (provided the declare simd function or prototype is visible).

Your original request was based on having a #pragma omp simd, nested within an #pragma omp simd.

Jim Dempsey

jimdempseyatthecove · ‎05-10-2019

>>So, is it correctly understood that the !DIR$ attributes vector directive is necessary when a function is called within a simd loop and is not being inlined?

Yes and no.

The compiler is capable of vectorizing loops that are not explicitly annotated with !$OMP SIMD. and that the declared function/subroutine is available (separate procedures with signature differentiation as done in the C++ example you referenced).

Jim Dempsey

AThar2 · ‎05-10-2019

@Jim, I did unfortunately not make myself clear in the beginning. I never have had simd(ized) loops within simd(ized) loops. I was referring to have one simd(ized) loop with a function call. That function contains a !$OMP DECLARE(FUNCTION_NAME). So I went thinking that you said this was not allowed.

Yes I do get the point that simd(ized) loop cannot contain another simd(ized) loop.

I only have one simd(ized) loop at a time. In the body of this loop I might have function calls. This is typical what I do: (Note I have both declare simd and inline, in the hope that the compiler will choose one of them) - as I am not sure which one is best and when is which appropriate.

!$omp simd
do i = 1, N 

  (....) 
  

  call proc1(....) 

enddo 



!DIR$ ATTRIBUTES INLINE :: proc1
subroutine proc1(....)
!$OMP SIMD DECLARE(proc1)

(..) ! vectorizable function. 
      ! it does not contain any simd loop!

end subroutine proc1

jimdempseyatthecove · ‎05-10-2019

The !dir$ simd declare... is intended to generate one or more versions of the subroutine or function that is callable from within a vectorized loop. It is not intended to be inlined.

Functions and subroutines that are inlined are potentially vectorizable (as well as may inhibit vectorization) within the code being generated for the caller. Think of this more as copy and paste the source code in the context of the caller's code.

If (when) you intend to inline a subroutine, then do not annotate it with SIMD. The SIMD-ization will then be dependent upon the nature of the code making the inline call. If the caller is SIMD .AND. the call-ee does not have anything to inhibit SIMD-ization of the call-er then the compiler can proceed to SIMD-ize the call-er inclusive of the call-ee.

It is not clear to me as to if there are conflicting (or non-conflicting) if you multiply mix

!$omp simd
and/or
!dir$ attributes inline
and/or
!$omp simd declare
and/or
!dir$ vector
and/or
... (any directives or such to force your intentions of simd and/or scalar generated code)

Pick one and stick with it.

Note, this does not preclude you from having:

subroutine proc1(...) ! scalar
subroutine proc1_inline(...) ! intended to be inlined (but may be called as scalar)
subroutine proc1_simd(...) ! ...
subroutine proc1_vector(...) ! inclusive of !dir$ vector clauses not available with simd

Jim Dempsey

AThar2 · ‎05-10-2019

Thanks for the tips Jim. Is there any rule of thumb whether to use !$omp simd declare or inlining/forceinline - is it more when subroutines become too lenghty you might want to use !$omp simd declare rather than inline to not get extremely larger executables?

jimdempseyatthecove · ‎05-10-2019

There is no rule of thumb other than test both ways.

Keep in mind that inlining, at times, can be counter-productive.

In some cases, inlining can bloat the code loop such that it no longer fits within the L1 Instruction Cache.
In some other cases, inlining can increase or decrease register pressure (this usually depends on number of arguments).

Use VTune and perform tests.

Also, do not assume SIMD will always perform faster code than scalar. Sometimes it will not. Same issue with parallization.

Jim Dempsey

AThar2 · ‎05-11-2019

Thanks for the help Jim!