segmentation fault with openMP

luis_gc_rego · ‎05-21-2011

Hi,

I get the following error message running ifort11.1 with openMP:

rrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
libpthread.so.0 00007FDDBD065E2B Unknown Unknown Unknown
libiomp5.so 00007FDDBEB7FEFA Unknown Unknown Unknown

The problem disappears and the code runs perfectly when I comment out the $OMP directives in the code segment below.
In addition the error segmentation fault disappears if I do:

[bash]SPEC_peaks(:) = SPEC_peaks(:) !+ peak_ij(:)[/bash]

[bash]SPEC_func(:) = SPEC_func(:) + Transition_Strength(i,j) !* peak_ij(:)
[/bash]

Here is the code segment

[fortran]allocate( peak_ij(npoints) )

forall(k=1:npoints) SPEC%grid(k) = (k-1)*step 

allocate( SPEC_peaks(npoints) , source = 0.d0 )
allocate( SPEC_func (npoints) , source = 0.d0 )

!$OMP parallel 
!$OMP do reduction(+:SPEC_peaks,SPEC_func) private( osc_const , resonance , peak_ij )
do i=1,dim_bra
    do j=1,dim_ket 

        resonance = QM%erg(trans_DP%bra_PTR(i)) - QM%erg(trans_DP%ket_PTR(j))
        Transition_Strength(i,j) = osc_const * resonance * Transition_Strength(i,j)

        peak_ij(:) = 0.d0
        where( dabs(SPEC%grid(:)-resonance) < step ) peak_ij = Transition_Strength(i,j)

        SPEC_peaks(:) = SPEC_peaks(:) + peak_ij(:)    

        peak_ij(:) = 0.d0
        where( ((SPEC%grid(:)-resonance)**2/two_sigma2) < 25.d0 ) peak_ij = dexp( -(SPEC%grid(:)-resonance)**2 / two_sigma2 )

        SPEC_func(:) = SPEC_func(:) + Transition_Strength(i,j) * peak_ij(:)

    end do
end do
!$OMP end do
!$OMP end parallel

SPEC%peaks = SPEC_peaks
SPEC%func  = SPEC_func [/fortran]

TimP · ‎05-22-2011

You might pay more attention to getting it working well in single threaded mode. Even in single threaded mode, it would be important to organize your inner loops for locality and vectorization.
It's difficult to see why you would want reduction operations.
You neglected to say whether you checked for stack overflow.

jimdempseyatthecove · ‎05-22-2011

I agree with Steve's comments.

Your code will vectorize better if you transpose your i and j loops. In FORTRAN the left most index represents adjacent (vectorizable) data (when stride = 1 element).

First, Compile as-is without OpenMP and with full optimizations. Run a few timesto produce base-line set of output data .AND. run times for this section of your code.

Second, interchange your i and j loop order, verify other changes to your code are made that may relate to this loop order change, Compile without OpenMP and with full optimizations. Run a few timesto producerevised set of output data .AND. run times for this section of your code. Verify that the outputs are correct. Correct results data maynot necessarily be equal since the order of operations has changed. The results should be within your margin of error.

Third, make SPEC_peaks and SPEC_func arrays local to each thread. Initialize SPEC%peaks and SPEC%func to 0.0 and after your !$OMP end do insert

!$OMP CRITICAL(SPEC)
SPEC%peaks=SPEC%peaks + SPEC_peaks
SPEC%func = SPEC%func + SPEC_func
!$OMP END CRITICAL(SPEC)
!$OMP END PARALLEL

If npoints is quite large the experiment with

!$OMP CRITICAL(SPECpeaks)
SPEC%peaks=SPEC%peaks + SPEC_peaks
!$OMP ENDCRITICAL(SPECpeaks)
!$OMP CRITICAL(SPECfunc)
SPEC%func = SPEC%func + SPEC_func
!$OMP END CRITICAL(SPECfunc)
!$OMP END PARALLEL

The above changes will improve parallel performancebut will diminish serial performance so you might want to conditionalize the code.

Jim Dempsey

Udit_P_Intel · ‎05-22-2011

Use the -auto switch without -openmp to explicitly put all variables on the runtime stack. Also, the SAVE (and possibly COMMON) attribute will explicitly not put variables on the runtime stack. This may help in determining whether it is the stack which is being blown or whether there are other problems causing the segfault.

Best!
-Udit

TimP · ‎05-22-2011

Setting -auto (which is included in -openmp) could expose bugs which are suppressed without that option but will surely hinder OpenMP. It won't come close to requiring as much stack as your example would require under -openmp.

luis_gc_rego · ‎05-22-2011

Dear Jim,

thanks for reminding me of transposing the i and j loops. I know about it but I have missed that one.
I have also genererated a large amount of output data to verify the correctness of the parallelized output data.
I have made modifications to the code according to your suggestions (code follows below).
The variables SPEC%peaks and SPEC%func were initialized to 0.0 and variable npoints = 1500. I also tried ulimit -s unlimited, without sucess. Tried also using -auto instead of -openmp, but nothing changes.
As before, the revised code produces correct results without !$OMP directives and an error with !$OMP
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source

Stack trace terminated abnormally.

[fortran]! . the optical spectrum : peaks and broadened lines ...
allocate( SPEC_peaks(npoints) , source = 0.d0 )
allocate( SPEC_func (npoints) , source = 0.d0 )

!$OMP parallel firstprivate( SPEC_peaks , SPEC_func )
!$OMP do  private( osc_const , resonance , peak_ij )
do j=1,dim_ket 
    do i=1,dim_bra

        resonance = QM%erg(trans_DP%bra_PTR(i)) - QM%erg(trans_DP%ket_PTR(j))
        Transition_Strength(i,j) = osc_const * resonance * Transition_Strength(i,j)

        peak_ij(:) = 0.d0
        where( dabs(SPEC%grid(:)-resonance) < step ) peak_ij(:) = Transition_Strength(i,j)

        SPEC_peaks(:) = SPEC_peaks(:) + peak_ij(:)    

        peak_ij(:) = 0.d0
        where( ((SPEC%grid(:)-resonance)**2/two_sigma2) < 25.d0 ) peak_ij(:) = dexp( -(SPEC%grid(:)-resonance)**2 / two_sigma2 )

        SPEC_func(:) = SPEC_func(:) + Transition_Strength(i,j) * peak_ij(:)

    end do
end do
!$OMP end do

!$OMP CRITICAL
    SPEC%peaks = SPEC%peaks + SPEC_peaks   
    SPEC%func  = SPEC%func + SPEC_func
!$OMP END CRITICAL

!$OMP end parallel

deallocate( SPEC_peaks , SPEC_func )
[/fortran]

luis_gc_rego · ‎05-22-2011

Jim,

I found a simpler construct for the code that works fine and efficiently in multi-thread mode and does not present any segmentation fault. The problem seemed to be directly related with the peak_ij array, which I eliminated from the code. Thanks for the suggestions.

[fortran]! . the optical spectrum : peaks and broadened lines ...
allocate( SPEC_peaks(npoints) , source = 0.d0 )
allocate( SPEC_func (npoints) , source = 0.d0 )

!$OMP parallel do private( resonance ) reduction( + : SPEC_peaks , SPEC_func ) 
do j=1,dim_ket 
    do i=1,dim_bra

        resonance = QM%erg(trans_DP%bra_PTR(i)) - QM%erg(trans_DP%ket_PTR(j))
        Transition_Strength(i,j) = osc_const * resonance * Transition_Strength(i,j)

        where( dabs(SPEC%grid(:)-resonance) < step ) SPEC_peaks(:) = SPEC_peaks(:) + Transition_Strength(i,j)

        where( ((SPEC%grid(:)-resonance)**2/two_sigma2) < 25.d0 ) &
        SPEC_func(:) = SPEC_func(:) + Transition_Strength(i,j) * dexp( -(SPEC%grid(:)-resonance)**2 / two_sigma2 )

    end do
end do
!$OMP end parallel do

SPEC%peaks = SPEC_peaks
SPEC%func  = SPEC_func

deallocate( SPEC_peaks , SPEC_func )
[/fortran]

jimdempseyatthecove · ‎05-23-2011

Glad you found a bug/issue on your own. Must make you feel good.

I've seen problem reports before that were related to shared allocatable arrays. Your 1-level earlier code peak_ij may have worked had you entered the parallel region with peak_ij not allocated (but private) then explicitly allocate this array just as you do now for SPECS_... arrays. Your newest code without the temporary array seems better.

Also, I recommend experimenting with replacing the two where statements with a single do loop as I do not know if the compiler optimizations collapsed the two where loops such that it makes one pass through the SPEC%grid(:) array. If SPEC%grid(:) plus other working data is larger than L1 cache you might see a significant improvement, also it might save one computation of (SPEC%grid(:)-resonance). It might be worth trying both techniques.

Jim Dempsey