Solved: Debug in OpenMP

Carlygao · ‎12-07-2022

Hi,

I want to know how every body does debug when you use OpenMP in Fortran.

A piece of my code using OMP is shown in the end. There are many subroutines called in fe_element_results. In one of the subroutine, it calls a subroutine for a material model. If I use 20 processors, it always gave me an error at the following place. The error message is "Unhandled exception at 0x00007FF6D807B9CF in bsam20_2022_09_13_59667d1.exe: 0xC000008E: Floating-point division by zero (parameters: 0x0000000000000000, 0x0000000000009964)."

However, I checked nxi, and it was never be zero (12.756). I want to know how to check the nxi value in all processors, so I can know how nxi can be zero.

If I change to use only 1 processor, no error. The running will be done in the end successfully. So this error only showed up when multiple processors were used. Thanks.

The location where the error showed up:

do i = 1,nstr
enormal(i) = xi(i)/nxi
end do ! i

A piece of my code using OMP:

!$OMP PARALLEL DO PRIVATE(elem)
! make the element results
do i=1,group%elem_count
elem => group%elements(i)
call fe_element_results(boundary_id,elem,constit_id,temp,group%disps,ksyst)
end do
!$OMP END PARALLEL DO

Carly

Carlygao · ‎01-12-2023

Hi Jim,

After getting stuck by this problem around 5 months, I finally found the reasons. The biggest problem is that I used "save" after clarifying the data types in the "mises" subroutine. This made "nxi" saved in each thread was totally messed up. After deleting "save", many cases using multithreads can be run successfully. Another issue is about how to calculate nxi. I have to explicitly compute nxi in the subroutine, rather than call a function (as the first commented line). If I call the function "dot_fl" to calculate nxi, the running will stuck in the calling place with the error message "float-point invalid operation". I still don't understand this point.

Anyway, the code can run without problems now. Thanks Jim for teaching me to access the information in each thread, which helped me a lot to narrow down the problem scope.

! nxi = sqrt(dot_fl(xi(1:6),xi(1:nstr),6) + dot_fl(xi(4:6),xi(4:nstr),3))
nxi = 0.0
do ii = 1,6
nxi = nxi + xi(ii)*xi(ii)
enddo
do ii=4,6
nxi = nxi + xi(ii)*xi(ii)
enddo

nxi = dsqrt(nxi)

Carly

View solution in original post

jimdempseyatthecove · ‎12-08-2022

Most debuggers permit you to change the context from thread-to-thread.

In MS VS there is a threads tab, click on that to view the threads, then double-click on a/each thread.

gdb, eclipse, etc... have similar feature.

Use this to locate the offending thread. Note, you may need to look at each thread's dissassembly code to locate one performing a (one of) division instruction. Then with this thread in context, examine the value of the variables.

Note, nxi may differ amoungst threads.

What happens when using OMP_NUM_THREADS=1? ...=2?

Jim Dempsey

Carlygao · ‎12-08-2022

Hi, Jim,

Thanks for your suggestions.

If OMP_NUM_THREADS=1, the running can be done successfully. However, if OMP_NUM_THREADS is larger than 1, I have the above errors. I changed the codes in the following way to make sure nxi is larger than zero. The running still stopped at the red line with the save error message (divided by zero), but nxi is not zero. I will learn how to the nxi values in each thread. Thanks.

if (nxi.gt.1e-10) then

do i = 1,nstr
enormal(i) = xi(i)/nxi
end do ! i

endif

Carly

jimdempseyatthecove · ‎12-11-2022

Is nxi a shared variable or a private variable?

If shared, is it modified within a parallel region (or by other process non-OpenMP thread)?

Have you run Debug build with full runtime checks?

Jim Dempsey

JohnNichols · ‎12-11-2022

Circa 1968, print out the thread number, i and nxi before the equation and then trap the zero.

Carlygao · ‎12-11-2022

Jim,

Only the subroutine "fe_element_results" is called when OMP is activated, and only the data structure "elem" is set to be private. Inside of the subroutine "fe_element_results", there are many subroutines are called at different levels.

The subroutine "mises" is called very late. Inside of the subroutine "mises", "nxi" is a local variable, which is a norm of the vector. I set a condition to run the problematic code only when "nxi" is larger than zero, but I still have the "divided by zero" mistake. I took John's suggestion, and printed out the thread ID and "nxi" values as follows.

A screen shot of the error message is included in this message. Only thread 0 is used, even though I set the max thread number is 2. And there is no zero values for "nxi". The information from the screen is also shown below. I didn't put all the print-out of "nxi", but I pretty sure there all "nxi" are positive. When the running stopped, "i" is equal to 4, and the values xi(i) are normal. However, the values for enormal(1:3) are all zeros. I don't think the code runs for i=1~3.

!$OMP PARALLEL DO PRIVATE(elem)
! make the element results
do i=1,group%elem_count
elem => group%elements(i)
call fe_element_results(boundary_id,elem,constit_id,temp,group%disps,ksyst)
end do
!$OMP END PARALLEL DO

The location where the running stopped:

do i = 1,nstr
write(*,*) "Hello from process: ", OMP_GET_THREAD_NUM(),nxi
enormal(i) = xi(i)/nxi
end do ! i

The information on the screen:

:

Hello from process: 0 3.93085348357440
Hello from process: 0 3.37072760756458
Hello from process: 0 3.25511677539116
Hello from process: 0 3.24071910804205
Hello from process: 0 4.59724023178906
Hello from process: 0 4.59370775225122
Hello from process: 0 4.56246536667898
Hello from process: 0 4.54824517293683
Hello from process: 0 4.55925767089883
Hello from process: 0 4.46978041023899
Hello from process: 0 4.52182459800371
Hello from process: 0 4.46638424757439
Hello from process: 0 4.46644457574595
Hello from process: 0 4.38515075169356
Hello from process: 0 4.32231049267378
forrtl: error (73): floating divide by zero
Image PC Routine Line Source
bsam20_2022_10_21 00007FF63BCDFACA Unknown Unknown Unknown
bsam20_2022_10_21 00007FF63BCC8DC0 Unknown Unknown Unknown
bsam20_2022_10_21 00007FF63BC9CE28 Unknown Unknown Unknown
bsam20_2022_10_21 00007FF63BAF6504 Unknown Unknown Unknown
bsam20_2022_10_21 00007FF63BAE01A8 Unknown Unknown Unknown
bsam20_2022_10_21 00007FF63BA3D250 Unknown Unknown Unknown
bsam20_2022_10_21 00007FF63BA01F9A Unknown Unknown Unknown
libiomp5md.dll 00007FFB02BC0653 Unknown Unknown Unknown
libiomp5md.dll 00007FFB02B20A97 Unknown Unknown Unknown
libiomp5md.dll 00007FFB02B21E4E Unknown Unknown Unknown
libiomp5md.dll 00007FFB02AD8D21 Unknown Unknown Unknown
bsam20_2022_10_21 00007FF63BA01BDD Unknown Unknown Unknown
bsam20_2022_10_21 00007FF63BC02DBA Unknown Unknown Unknown
bsam20_2022_10_21 00007FF63BA64797 Unknown Unknown Unknown
bsam20_2022_10_21 00007FF63B8B973E Unknown Unknown Unknown
bsam20_2022_10_21 00007FF63B74DB6B IBN_INI 735 ibn_ini.f
bsam20_2022_10_21 00007FF63BCE9A8E Unknown Unknown Unknown
bsam20_2022_10_21 00007FF63C1EC0D9 Unknown Unknown Unknown
bsam20_2022_10_21 00007FF63C1EBF7E Unknown Unknown Unknown
bsam20_2022_10_21 00007FF63C1EBE3E Unknown Unknown Unknown
bsam20_2022_10_21 00007FF63C1EC16E Unknown Unknown Unknown
KERNEL32.DLL 00007FFB2A9B4ED0 Unknown Unknown Unknown
ntdll.dll 00007FFB2BBBE40B Unknown Unknown Unknown

jimdempseyatthecove · ‎12-12-2022

The only time I've seen a case similar to this (~15 years ago) was where a piece of my errant code overwrote the instructions at or near the place where the error trap occurs. Your case is a little different in that, usually when this happens, the insertion of diagnostic code causes the error to go away or move.

Have you run your code with all runtime diagnostic code enabled? This will catch most index out of bounds and most/many uninitialized variable usages.

Also, an additional measure I would use is at point of error, I'd open a Disassembly window (and registers window(s)) and examine the instruction set. nxi will have loaded into a register for use in division (or generation of reciprocal and then multiplication). The residual values within the registers may shed some light on this (at least for someone familiar with reading assembly).

Jim Dempsey

Carlygao · ‎12-20-2022

Hi,

I am still trying to learn how to check register. However, I found the code will stop for different reasons if I run multiple times. If the error is "Floating-point division by zero (parameters: 0x0000000000000000, 0x0000000000009964).", nxi is really zero in some thread, which should be impossible for the code. I caught this after I wrote everything into a file, not on screen.

I put the main frame of the code as follows. I still don't understand why all the errors are related to nxi, a local variable. The errors always happened in the following 3 places which are marked in red. If it happened in the first place, the error is "invalid floating operation. For the 2nd places, the error is "Floating-point division by zero". For the 3rd place, either "Floating-point division by zero" or "0xC0000090: Floating-point invalid operation (parameters: 0x0000000000000000, 0x0000000000009961).".

A screenshot of the error for the current run is attached in the message. At this time, the error showed up at the 2nd place. However, there is no nxi values written into the file from the 2nd place. Instead, all the nxi written into the file are from the 3rd place. nxi is zero only once. The running didn't stop when nxi is zero immediately. It stopped in the next iteration loop.

the frame of the code:

compute nxi=xi(1)*xi(1)+xi(2)*xi(i)+....xi(6)*xi(6)

nxi = dsqrt(nxi)

if (nxi.gt.1e-10.and.....) then !I don't understand how nxi can be zero because of this condition.

.

do i = 1,nstr
if(nxi.lt.1e-5) write(iunit,*) "enormal Hello from process: ", OMP_GET_THREAD_NUM(),nxi
enormal(i) = xi(i)/nxi
end do ! i

if(lambda.gt.0.0d0) then
write(iunit,*) "Hello from process: ", OMP_GET_THREAD_NUM(),nxi
ct2 = twog*lambda/nxi
ct1 = twog*(1.d0 - ct2)
ct2 = twog*(ct2 - twog/(lam1 + lam2*expbl))

else !
ct1 = twog
ct2 = 0.0d0
endif

.

endif

nxi value written into a file:

Hello from process: 0 27.5011288096126
Hello from process: 0 23.4530657556719
Hello from process: 0 16.8064387718508
Hello from process: 0 0.000000000000000E+000
Hello from process: 0 19.3280241275060

jimdempseyatthecove · ‎12-21-2022

For the error trapping source file, set the Floating Point Model to Precise, and Floating Point Speculation to Off

You also might want to experiment with

/Qprec improve floating-point precision
/Qprec-sqrt improve square root precision
/Qprec-div improve precision of FP divides (some speed impact)

Set the above in Command Line, Additional Options.

Note, in MS VS you can vary the options on a file by file basis. In Solution Explorer window, Right-Click on source file and select Properties and navigate from there. After options are selected, the file icon in Solution Explorer will have a noticeable tag as a reminder that the file has different settings than the project.

Notes regarding your Hello from...

A formatted write can print 0.0 for very small non-zero numbers. Use Z edit descriptor to print the Hex value.

The sqrt/dsqrt functions can be set to various precision values (fast==lesser precision)

Division, fast mode, is often performed by produce reciprocal (at loss of precision) then multiply.

And are you certain that variable nxi is local to the procedure .and. not shared.

Jim Dempsey

Carlygao · ‎12-22-2022

Jim,

Thanks for your guidance. For your last question, I want to confirm when to define the private and share variables for OpenMP. Based on my understanding, the variables need to be claimed either as shared or private only when the OpenMP is activated. In my code, the OpenMP is activated in the following place, so I claimed a big data structure "elem" as a private variable. Other parameters passed into fe_element_results are supposed to be shared by default. Then many subroutines are in fe_element_results subroutine. And one of them eventually called the subroutine mises. "nxi" is a local variable in the subroutine mises. I searched "nxi" in the whole package. It is only used in the mises subroutine.

For the parameters passed into the subroutines called directly or indirectly by fe_element_results, do we need to claim their variables to be private or shared? I thought they should be private by default, because they are already in different treads when they are used.

!$OMP PARALLEL DO PRIVATE(elem)
! make the element results
do i=1,group%elem_count
elem => group%elements(i)
call fe_element_results(boundary_id,elem,constit_id,temp,group%disps,ksyst)
end do
!$OMP END PARALLEL DO

Subroutine fe_element_results(boundary_id,elem,constit_id,temp,group%disps,ksyst)

call subroutine A(para a1, para a2, ....)

call subroutine AA

end subroutine fe_element_results

Subroutine A(para a1, para a2, ....)

call mises(para m1, para m2,....)

end subroutine A

Carly

jimdempseyatthecove · ‎12-22-2022

>>call fe_element_results(boundary_id,elem,constit_id,temp,group%disps,ksyst)

This will be safe (for the sketch supplied) provided: boundary_id, constit_id, temp, group%disps,ksyst

remain unchanged during the call. I have suspicions about temp though you verify all arguments, as well as all module and common variables remain unchanged through the call chain. (otherwise, they need to be private)

Jim Dempsey

Carlygao · ‎12-22-2022

Jim,

I am not sure how to deal with global variables in OpenMP. I summarized the following items to make sure I got your ideas.

1. if I drop "elem" from the parameter list for fe_element_results subroutine list, should I add "i" in the parameter list and claim it private? "i" is the index for the elements. "elem" is a sub-data structure for group for element "i".

!$OMP PARALLEL DO PRIVATE(elem)
! make the element results
do i=1,group%elem_count
elem => group%elements(i)
call fe_element_results(boundary_id,elem,constit_id,temp,group%disps,ksyst)
end do
!$OMP END PARALLEL DO

2. "temp" is a double precision scalar data for an incremented temperature.

3. You mentioned for all the global variables (saved in modules), they need to be claimed as private parameters if they are changed in any place in fe_element_results and all other subroutines called directly or indirectly by fe_element_results. I will check this carefully. I think the place to claim these global variables if needed should be still in !$OMP PARALLEL DO PRIVATE(elem,....), right?

4. I used "write(iunit,'(A,d20.14)') "Hello from process: ", OMP_GET_THREAD_NUM(),nxi" to write the nxi values from different threads into one single file in the "mises" subroutine. "iunit" is a global IO number. I think it should be a shared variable. When one thread is writing information into the iunit file, will other threads be influenced if they want to write the data into the iunit file at the same time?

Thanks.

Carly

jimdempseyatthecove · ‎12-23-2022

1) elem must be private (each thread must have unique pointer). On parallel do, the loop control variable is implicitly private. You can declare i private if you want for clarity purposes, but it is not necessary.

2) temp(erature) is ok shared.

3) yes. For global variables (module and common) that store intermediary results, these need to be private.

*** however *** declaring these private on the parallel do will not make them private in the called procedures. Private applies only to the immediate scope of the parallel region (not to any called procedure).

For temporary work variables that are global, use threadprivate at the declaration of the variable (not on the !$omp parallel...).

Note the example in the reference guide shows for named common, but you can also use threadprivate on module variables as well.

*** additionally *** should you have an accumulation variable in a parallel region you would use the REDUCTION clause.

4. shared is the default property of parallel regions.

Additional notes:

WRITE to a file, when (potentially) executed concurrently, should be protected by a critical section. This is to prevent co-mingling of the text within/amoungst the records of all conflicting threads. Further it is recommended that you name each critical section with a unique name. For example:

if(nxi.lt.1e-5) then
   !$omp critical(critical_enormal) ! pre-pend critical_ to assure no name conflict with variable "enormal" if any
   write(iunit,*) "enormal Hello from process: ", OMP_GET_THREAD_NUM(),nxi
   !$omp end critical(critical_enormal)
endif

Also note that you can insert a debugger break point there to help catch errors (assuming nxi.lt.1e-5 should never occur).

Jim Dempsey

Carlygao · ‎01-12-2023

Hi Jim,

After getting stuck by this problem around 5 months, I finally found the reasons. The biggest problem is that I used "save" after clarifying the data types in the "mises" subroutine. This made "nxi" saved in each thread was totally messed up. After deleting "save", many cases using multithreads can be run successfully. Another issue is about how to calculate nxi. I have to explicitly compute nxi in the subroutine, rather than call a function (as the first commented line). If I call the function "dot_fl" to calculate nxi, the running will stuck in the calling place with the error message "float-point invalid operation". I still don't understand this point.

Anyway, the code can run without problems now. Thanks Jim for teaching me to access the information in each thread, which helped me a lot to narrow down the problem scope.

! nxi = sqrt(dot_fl(xi(1:6),xi(1:nstr),6) + dot_fl(xi(4:6),xi(4:nstr),3))
nxi = 0.0
do ii = 1,6
nxi = nxi + xi(ii)*xi(ii)
enddo
do ii=4,6
nxi = nxi + xi(ii)*xi(ii)
enddo

nxi = dsqrt(nxi)

Carly

jimdempseyatthecove · ‎01-13-2023

It would help if you show the function dot_fi .AND. the value of nstr.

Your replacement code would indicate nstr must be 6.

Also note:

real :: dot_fi = 0.0 ! initialization of local/return variable is implicitly SAVE

Thus if your function dot_fi initialized the result as part of the declaration, you would have a threading problem.

Initialize your local variables as statements within the procedure.

And Fortran has the function DOT_PRODUCT, consider:

nxi = sqrt(dot_product(xi(1:6),xi(1:6) + dot_product(xi(4:6),xi(4:6))

assuming there isn't a problem with nstr not being 6

Jim Dempsey

Carlygao · ‎01-13-2023

Hi Jim,

Yes, nstr=6. I set its value in the beginning of the mises subroutine. My dot_fl function is defined below. When I copied it here, I found the problem, I used "save" there. After I deleted it "save", put it into an external module and include it in mises. It has no problem now. This is the way I want. Using the Fortran function "dot_product" also works.

I checked some comments on line about "save" in Fortran. Somebody said it is a disaster for multithead coding. Now I fully agree with it:) Thanks.

function dot_fl(a,b,n)
!=======================================================================
!> @brief dot (scalar) product of two vectors
!> @details
!> Inputs:
!> a(*) - Vector 1
!> b(*) - Vector 2
!> nn - length of vectors
!> Outputs:
!> dot_fl- Scalar product
!=======================================================================

implicit none

integer ii,n
real(8) dot_fl, a(*),b(*)

save

dot_fl = 0.0d0
do ii = 1,n
dot_fl = dot_fl + a(ii)*b(ii)
end do ! i

end function dot_fl

Carly

jimdempseyatthecove · ‎01-13-2023

As you highlighted, the save statement attributes all local variables (ii in this case) with SAVE thus making the procedure unsafe for threaded programming.

Note, this code could potentially run in a threaded program for a long time without experiencing an error. Expect no error if a thread can get in and out of the function prior to another thread getting info the function .OR. if compiler optimization happens to registerize ii in this case. IOW you were lucky that your testing exhibited an error.

Jim Dempsey