- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I get the following error message running ifort11.1 with openMP:
rrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
libpthread.so.0 00007FDDBD065E2B Unknown Unknown Unknown
libiomp5.so 00007FDDBEB7FEFA Unknown Unknown Unknown
The problem disappears and the code runs perfectly when I comment out the $OMP directives in the code segment below.
In addition the error segmentation fault disappears if I do:
Here is the code segment
I get the following error message running ifort11.1 with openMP:
rrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
libpthread.so.0 00007FDDBD065E2B Unknown Unknown Unknown
libiomp5.so 00007FDDBEB7FEFA Unknown Unknown Unknown
The problem disappears and the code runs perfectly when I comment out the $OMP directives in the code segment below.
In addition the error segmentation fault disappears if I do:
[bash]SPEC_peaks(:) = SPEC_peaks(:) !+ peak_ij(:)[/bash]
[bash]SPEC_func(:) = SPEC_func(:) + Transition_Strength(i,j) !* peak_ij(:)
[/bash]
Here is the code segment
[fortran]allocate( peak_ij(npoints) ) forall(k=1:npoints) SPEC%grid(k) = (k-1)*step allocate( SPEC_peaks(npoints) , source = 0.d0 ) allocate( SPEC_func (npoints) , source = 0.d0 ) !$OMP parallel !$OMP do reduction(+:SPEC_peaks,SPEC_func) private( osc_const , resonance , peak_ij ) do i=1,dim_bra do j=1,dim_ket resonance = QM%erg(trans_DP%bra_PTR(i)) - QM%erg(trans_DP%ket_PTR(j)) Transition_Strength(i,j) = osc_const * resonance * Transition_Strength(i,j) peak_ij(:) = 0.d0 where( dabs(SPEC%grid(:)-resonance) < step ) peak_ij = Transition_Strength(i,j) SPEC_peaks(:) = SPEC_peaks(:) + peak_ij(:) peak_ij(:) = 0.d0 where( ((SPEC%grid(:)-resonance)**2/two_sigma2) < 25.d0 ) peak_ij = dexp( -(SPEC%grid(:)-resonance)**2 / two_sigma2 ) SPEC_func(:) = SPEC_func(:) + Transition_Strength(i,j) * peak_ij(:) end do end do !$OMP end do !$OMP end parallel SPEC%peaks = SPEC_peaks SPEC%func = SPEC_func [/fortran]
Link Copied
7 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You might pay more attention to getting it working well in single threaded mode. Even in single threaded mode, it would be important to organize your inner loops for locality and vectorization.
It's difficult to see why you would want reduction operations.
You neglected to say whether you checked for stack overflow.
It's difficult to see why you would want reduction operations.
You neglected to say whether you checked for stack overflow.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I agree with Steve's comments.
Your code will vectorize better if you transpose your i and j loops. In FORTRAN the left most index represents adjacent (vectorizable) data (when stride = 1 element).
First, Compile as-is without OpenMP and with full optimizations. Run a few timesto produce base-line set of output data .AND. run times for this section of your code.
Second, interchange your i and j loop order, verify other changes to your code are made that may relate to this loop order change, Compile without OpenMP and with full optimizations. Run a few timesto producerevised set of output data .AND. run times for this section of your code. Verify that the outputs are correct. Correct results data maynot necessarily be equal since the order of operations has changed. The results should be within your margin of error.
Third, make SPEC_peaks and SPEC_func arrays local to each thread. Initialize SPEC%peaks and SPEC%func to 0.0 and after your !$OMP end do insert
!$OMP CRITICAL(SPEC)
SPEC%peaks=SPEC%peaks + SPEC_peaks
SPEC%func = SPEC%func + SPEC_func
!$OMP END CRITICAL(SPEC)
!$OMP END PARALLEL
If npoints is quite large the experiment with
!$OMP CRITICAL(SPECpeaks)
SPEC%peaks=SPEC%peaks + SPEC_peaks
!$OMP ENDCRITICAL(SPECpeaks)
!$OMP CRITICAL(SPECfunc)
SPEC%func = SPEC%func + SPEC_func
!$OMP END CRITICAL(SPECfunc)
!$OMP END PARALLEL
The above changes will improve parallel performancebut will diminish serial performance so you might want to conditionalize the code.
Jim Dempsey
Your code will vectorize better if you transpose your i and j loops. In FORTRAN the left most index represents adjacent (vectorizable) data (when stride = 1 element).
First, Compile as-is without OpenMP and with full optimizations. Run a few timesto produce base-line set of output data .AND. run times for this section of your code.
Second, interchange your i and j loop order, verify other changes to your code are made that may relate to this loop order change, Compile without OpenMP and with full optimizations. Run a few timesto producerevised set of output data .AND. run times for this section of your code. Verify that the outputs are correct. Correct results data maynot necessarily be equal since the order of operations has changed. The results should be within your margin of error.
Third, make SPEC_peaks and SPEC_func arrays local to each thread. Initialize SPEC%peaks and SPEC%func to 0.0 and after your !$OMP end do insert
!$OMP CRITICAL(SPEC)
SPEC%peaks=SPEC%peaks + SPEC_peaks
SPEC%func = SPEC%func + SPEC_func
!$OMP END CRITICAL(SPEC)
!$OMP END PARALLEL
If npoints is quite large the experiment with
!$OMP CRITICAL(SPECpeaks)
SPEC%peaks=SPEC%peaks + SPEC_peaks
!$OMP ENDCRITICAL(SPECpeaks)
!$OMP CRITICAL(SPECfunc)
SPEC%func = SPEC%func + SPEC_func
!$OMP END CRITICAL(SPECfunc)
!$OMP END PARALLEL
The above changes will improve parallel performancebut will diminish serial performance so you might want to conditionalize the code.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Use the -auto switch without -openmp to explicitly put all variables on the runtime stack. Also, the SAVE (and possibly COMMON) attribute will explicitly not put variables on the runtime stack. This may help in determining whether it is the stack which is being blown or whether there are other problems causing the segfault.
Best!
-Udit
Best!
-Udit
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Setting -auto (which is included in -openmp) could expose bugs which are suppressed without that option but will surely hinder OpenMP. It won't come close to requiring as much stack as your example would require under -openmp.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear Jim,
thanks for reminding me of transposing the i and j loops. I know about it but I have missed that one.
I have also genererated a large amount of output data to verify the correctness of the parallelized output data.
I have made modifications to the code according to your suggestions (code follows below).
The variables SPEC%peaks and SPEC%func were initialized to 0.0 and variable npoints = 1500. I also tried ulimit -s unlimited, without sucess. Tried also using -auto instead of -openmp, but nothing changes.
As before, the revised code produces correct results without !$OMP directives and an error with !$OMP
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
Stack trace terminated abnormally.
thanks for reminding me of transposing the i and j loops. I know about it but I have missed that one.
I have also genererated a large amount of output data to verify the correctness of the parallelized output data.
I have made modifications to the code according to your suggestions (code follows below).
The variables SPEC%peaks and SPEC%func were initialized to 0.0 and variable npoints = 1500. I also tried ulimit -s unlimited, without sucess. Tried also using -auto instead of -openmp, but nothing changes.
As before, the revised code produces correct results without !$OMP directives and an error with !$OMP
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
Stack trace terminated abnormally.
[fortran]! . the optical spectrum : peaks and broadened lines ... allocate( SPEC_peaks(npoints) , source = 0.d0 ) allocate( SPEC_func (npoints) , source = 0.d0 ) !$OMP parallel firstprivate( SPEC_peaks , SPEC_func ) !$OMP do private( osc_const , resonance , peak_ij ) do j=1,dim_ket do i=1,dim_bra resonance = QM%erg(trans_DP%bra_PTR(i)) - QM%erg(trans_DP%ket_PTR(j)) Transition_Strength(i,j) = osc_const * resonance * Transition_Strength(i,j) peak_ij(:) = 0.d0 where( dabs(SPEC%grid(:)-resonance) < step ) peak_ij(:) = Transition_Strength(i,j) SPEC_peaks(:) = SPEC_peaks(:) + peak_ij(:) peak_ij(:) = 0.d0 where( ((SPEC%grid(:)-resonance)**2/two_sigma2) < 25.d0 ) peak_ij(:) = dexp( -(SPEC%grid(:)-resonance)**2 / two_sigma2 ) SPEC_func(:) = SPEC_func(:) + Transition_Strength(i,j) * peak_ij(:) end do end do !$OMP end do !$OMP CRITICAL SPEC%peaks = SPEC%peaks + SPEC_peaks SPEC%func = SPEC%func + SPEC_func !$OMP END CRITICAL !$OMP end parallel deallocate( SPEC_peaks , SPEC_func ) [/fortran]
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Jim,
I found a simpler construct for the code that works fine and efficiently in multi-thread mode and does not present any segmentation fault. The problem seemed to be directly related with the peak_ij array, which I eliminated from the code. Thanks for the suggestions.
I found a simpler construct for the code that works fine and efficiently in multi-thread mode and does not present any segmentation fault. The problem seemed to be directly related with the peak_ij array, which I eliminated from the code. Thanks for the suggestions.
[fortran]! . the optical spectrum : peaks and broadened lines ... allocate( SPEC_peaks(npoints) , source = 0.d0 ) allocate( SPEC_func (npoints) , source = 0.d0 ) !$OMP parallel do private( resonance ) reduction( + : SPEC_peaks , SPEC_func ) do j=1,dim_ket do i=1,dim_bra resonance = QM%erg(trans_DP%bra_PTR(i)) - QM%erg(trans_DP%ket_PTR(j)) Transition_Strength(i,j) = osc_const * resonance * Transition_Strength(i,j) where( dabs(SPEC%grid(:)-resonance) < step ) SPEC_peaks(:) = SPEC_peaks(:) + Transition_Strength(i,j) where( ((SPEC%grid(:)-resonance)**2/two_sigma2) < 25.d0 ) & SPEC_func(:) = SPEC_func(:) + Transition_Strength(i,j) * dexp( -(SPEC%grid(:)-resonance)**2 / two_sigma2 ) end do end do !$OMP end parallel do SPEC%peaks = SPEC_peaks SPEC%func = SPEC_func deallocate( SPEC_peaks , SPEC_func ) [/fortran]
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Glad you found a bug/issue on your own. Must make you feel good.
I've seen problem reports before that were related to shared allocatable arrays. Your 1-level earlier code peak_ij may have worked had you entered the parallel region with peak_ij not allocated (but private) then explicitly allocate this array just as you do now for SPECS_... arrays. Your newest code without the temporary array seems better.
Also, I recommend experimenting with replacing the two where statements with a single do loop as I do not know if the compiler optimizations collapsed the two where loops such that it makes one pass through the SPEC%grid(:) array. If SPEC%grid(:) plus other working data is larger than L1 cache you might see a significant improvement, also it might save one computation of (SPEC%grid(:)-resonance). It might be worth trying both techniques.
Jim Dempsey
I've seen problem reports before that were related to shared allocatable arrays. Your 1-level earlier code peak_ij may have worked had you entered the parallel region with peak_ij not allocated (but private) then explicitly allocate this array just as you do now for SPECS_... arrays. Your newest code without the temporary array seems better.
Also, I recommend experimenting with replacing the two where statements with a single do loop as I do not know if the compiler optimizations collapsed the two where loops such that it makes one pass through the SPEC%grid(:) array. If SPEC%grid(:) plus other working data is larger than L1 cache you might see a significant improvement, also it might save one computation of (SPEC%grid(:)-resonance). It might be worth trying both techniques.
Jim Dempsey
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page