Solved: Loop variable optimized away - Page 2

Jonathan_B_ · ‎11-27-2013

I'm experiencing a rather odd circumstance and I'm looking for any advice on how to diagnose it or fix it. I'm implementing a sparse matrix solver, and I'm dividing up a matrix-vector product over a team of OpenMP threads using a do loop with static scheduling and balanced chunks of my matrix.

The problem is, my loop variable for the OpenMP do loop is getting optimized away when optimizations are turned on (-O1, -O2, -O3) and the loop is being run more times than intended.

In my debugging environment, I can only work with one thread ($OMP_NUM_THREADS=1 by admin), so this "loop" should behave like serial code. However, my debug messages indicate that my loop variable is going beyond 1, and idbc reports when I'm inside the loop

(idb) print i
Info: symbol i is defined but not allocated (optimized away)
Error: no value for symbol i
Cannot evaluate 'i'.

How should I go about figuring out what ifort has done in this optimization? Superficially, this acts like a bug, but I'm uncomfortable making that assertion without seeing exactly what the optimizations have done.

Thanks,
Jonathan

jimdempseyatthecove · ‎12-07-2013

Consider (inside parallel region)

(I is private)

[fortran]
do I = omp_get_thread_num() + 1, yourUpper, omp_get_num_threads()
...
end do
[/fortran]

Jim Dempsey

View solution in original post

Jonathan_B_ · ‎12-07-2013

Hi Jim,

Thanks for the suggestion, however the sys admins set $OMP_NUM_THREADS=1 on non-compute nodes, so the bounds on variable i match with the number of threads. Still, it was worth a try, so I inserted

[fortran] if (i > size(chunk)) cycle [/fortran]

since this is an OpenMP parallelized do loop and EXIT commands are prohibited. However, that had no effect. The program output was identical. This is why I would like to track down the internal loop iteration count variable and number of iterations calculated, but since they're not going to be in the debugging symbols I need a recommendation on how to find them.

In the production environment, this will be running with 16+ threads, but it should work with however many threads are available.

Jonathan

jimdempseyatthecove · ‎12-07-2013

Consider (inside parallel region)

(I is private)

[fortran]
do I = omp_get_thread_num() + 1, yourUpper, omp_get_num_threads()
...
end do
[/fortran]

Jim Dempsey

Jonathan_B_ · ‎12-07-2013

Hi Jim,

That is a beautiful modification, and it worked. My best guess is that having all bounds of the loop dependent on the environment prohibited ifort from making the assumption that caused the error in optimization.

Thanks much!

Jonathan

jimdempseyatthecove · ‎12-08-2013

The above suggestion will work best when the amount of work for each I is approximately equal.

Also note, if the output needs to be in order of I then consider something like this:

volatile integer :: NextOutput

(in parallel region, NextOutput shared, I private)

NextOutput = 1 ! all threads reset
!$OMP BARRIER ! assure all threads past above
do I = omp_get_thread_num() + 1, YourUpper, omp_get_num_threads()
... ! parallel computational work here
do while(NextOutput .NE. I)
call SleepQQ(0)
end do
write(*,*) YourOutput
NextOutput = NextOutput + 1
end do
[/fortranj]

Jim Dempsey

Jonathan_B_ · ‎12-09-2013

I just checked the optimization report - the loop was not unrolled. Any guess for what I should look for to identify the source of the error? I'd like to find the behavior that caused this and submit that as a bug for optimizations.

Thanks,

Jonathan