- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
the following reduced code works with previous versions of ifort and ifx, but crashes with a division by 0 in the OpenMP workshare construct at the indicated assignment when run with more than 1 OpenMP thread:
! Crashes during runtime with ifx-2025.0 when compiled with:
! ifx ifx-omp-workshare.f90 -fopenmp -g -traceback -fpe0 -O -fpe0
program test
implicit none
integer, parameter :: wp = kind (1.d0)
real(wp), allocatable :: x(:,:), f(:,:), d2(:,:)
integer :: np, n, k, m
np = 4225
n = 60
allocate (x(np,n), f(np,n), d2(np,n))
call random_number (f)
x = f * 0.5_wp
do k = 2, n
x(:,k) = x(:,k) + k
end do
do m = 1, 2
print *, "Outer iteration", m
call spline_init_vec (x, f, d2, np, n)
end do
contains
subroutine spline_init_vec (x, y, y2, np, n)
integer ,intent(in) :: np
integer ,intent(in) :: n
real(wp) ,intent(in) :: x (np,n)
real(wp) ,intent(in) :: y (np,n)
real(wp) ,intent(out) :: y2(np,n)
!-----------------------------------------
integer :: i, j, k
real(wp) :: p(np), sig(np)
real(wp), allocatable :: u(:,:)
allocate (u(np,n-1))
do j = 1, np
y2(j,1) = 0._wp
u (j,1) = 0._wp
end do
!$omp parallel private(sig,p)
do i = 2,n-1
!$omp master
write(0,*) "### i=", i
!$omp end master
!$omp workshare
sig(:) = (x(:,i)-x(:,i-1))/(x(:,i+1)-x(:,i-1))
p(:) = sig(:)*y2(:,i-1)+2._wp
y2(:,i)=(sig(:)-1._wp)/p(:)
u (:,i)=(6._wp*((y(:,i+1)-y(:,i))/(x(:,i+1)-x(:,i)) &
-(y(:,i)-y(:,i-1))/(x(:,i)-x(:,i-1))) / &
(x(:,i+1)-x(:,i-1)) &
- sig(:)*u(:,i-1))/p(:) ! <- bogus division by 0 here
!$omp end workshare
end do
!$omp end parallel
end subroutine
end
I get (using the indicated compile options above):
% OMP_NUM_THREADS=2 ./a.out
Outer iteration 1
### i= 2
forrtl: error (73): floating divide by zero
Image PC Routine Line Source
libpthread-2.31.s 0000152A15F01910 Unknown Unknown Unknown
a.out 000000000040582B test_IP_spline_in 52 ifx-omp-workshare.f90
libiomp5.so 0000152A15B45603 __kmp_invoke_micr Unknown Unknown
libiomp5.so 0000152A15ACA633 Unknown Unknown Unknown
libiomp5.so 0000152A15AC3B28 __kmp_fork_call Unknown Unknown
libiomp5.so 0000152A15A84B88 __kmpc_fork_call Unknown Unknown
a.out 00000000004050A1 spline_init_vec 39 ifx-omp-workshare.f90
a.out 0000000000404DF9 test 18 ifx-omp-workshare.f90
a.out 000000000040476D Unknown Unknown Unknown
libc-2.31.so 0000152A1583E24D __libc_start_main Unknown Unknown
a.out 000000000040469A Unknown Unknown Unknown
Aborted (core dumped)
This happens on OpenSUSE 15.5.
Is there anything I might miss?
Thanks,
Harald
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
No, the workshare construct was implemented long ago. It may just be luck that it did work before.
see https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-5-2.pdf page 246:
8 How units of work are assigned to the threads that execute a workshare region is unspecified.
9 If an array expression in the block references the value, association status, or allocation status of
10 private variables, the value of the expression is undefined, unless the same value would be
11 computed by every thread.
Since you privatize the arrays sig and p only parts of those arrays contain useful numbers for each thread. Now if the runtime decides to distribute the next access to sig and p in a different way among threads, you are likely getting uninitialized data.
If you do not privatize the arrays, those arrays are filled completely and it will work. But again, for performance reasons I would rewrite it with the loop and use sig and p as scalars and let the compiler vectorize the inner loop (!$omp do simd)
Best
Tobias
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Are you saying this is wrong?
sig and p are just work arrays with their effective lifetime being the omp parallel region.
Writing out that region by plain loops would need corresponding private temporaries.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
!$omp parallel
do i = 2,n-1
!$omp master
write(0,*) "### i=", i
!$omp end master
!$omp do
do j=1,np
sig(j) = (x(j,i)-x(j,i-1))/(x(j,i+1)-x(j,i-1))
p(j) = sig(j)*y2(j,i-1)+2._wp
y2(j,i)=(sig(j)-1._wp)/p(j)
u (j,i)=(6._wp*((y(j,i+1)-y(j,i))/(x(j,i+1)-x(j,i)) &
-(y(j,i)-y(j,i-1))/(x(j,i)-x(j,i-1))) / &
(x(j,i+1)-x(j,i-1)) &
- sig(j)*u(j,i-1))/p(j) ! <- bogus division by 0 here
end do
!$omp end parallel
I would rewrite the loop without a workshare construct. I also don't see why sig and p need to be private, actually, I don't see why sig and p need to be arrays in the first place. Scalars would need to be private, though.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You evaded my previous question: is the source code wrong, or is the problem elsewhere?
My understanding of workshare is that one can keep concise array expressions and does not need to do the compiler's work.
Playing some more, I found that replacing the declaration of u (explicit allocation on the heap) by an automatic array (!) makes the code work here, i.e.
replacing:
real(wp), allocatable :: u(:,:)
allocate (u(np,n-1))
by:
real(wp) :: u(np,n-1)
Of course this is not wanted for the full application, as the array may be too large for the stack.
This seems to suggest that sig and p are actually not the cause of the problem.
So maybe the reason that this code did not have a problem with previous versions might be Intel simply did not implement workshare before 2025.0 (just guessing)?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
No, the workshare construct was implemented long ago. It may just be luck that it did work before.
see https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-5-2.pdf page 246:
8 How units of work are assigned to the threads that execute a workshare region is unspecified.
9 If an array expression in the block references the value, association status, or allocation status of
10 private variables, the value of the expression is undefined, unless the same value would be
11 computed by every thread.
Since you privatize the arrays sig and p only parts of those arrays contain useful numbers for each thread. Now if the runtime decides to distribute the next access to sig and p in a different way among threads, you are likely getting uninitialized data.
If you do not privatize the arrays, those arrays are filled completely and it will work. But again, for performance reasons I would rewrite it with the loop and use sig and p as scalars and let the compiler vectorize the inner loop (!$omp do simd)
Best
Tobias
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Ah, I see. I was under the impression that when using schedule(static) this would also apply to workshare regions, with same chunks for same loop lengths.
(I know one cannot set the schedule in the directive here, but omp_get_schedule confirms that it is set to static.)
So if the compiler "sees" the declarations of the first index of all arrays, it just works accidentally with "private" (maybe due to loop fusion), otherwise it will treat the assignment to u separately and somehow use a different choice.
I am going to change the code suitably.
Thanks for the insight!
Harald
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page