Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
Announcements
FPGA community forums and blogs on community.intel.com are migrating to the new Altera Community and are read-only. For urgent support needs during this transition, please visit the FPGA Design Resources page or contact an Altera Authorized Distributor.

Loop variable optimized away

Jonathan_B_
Beginner
5,267 Views

I'm experiencing a rather odd circumstance and I'm looking for any advice on how to diagnose it or fix it. I'm implementing a sparse matrix solver, and I'm dividing up a matrix-vector product over a team of OpenMP threads using a do loop with static scheduling and balanced chunks of my matrix.

The problem is, my loop variable for the OpenMP do loop is getting optimized away when optimizations are turned on (-O1, -O2, -O3) and the loop is being run more times than intended.

In my debugging environment, I can only work with one thread ($OMP_NUM_THREADS=1 by admin), so this "loop" should behave like serial code. However, my debug messages indicate that my loop variable is going beyond 1, and idbc reports when I'm inside the loop

(idb) print i
Info: symbol i is defined but not allocated (optimized away)
Error: no value for symbol i
Cannot evaluate 'i'.

How should I go about figuring out what ifort has done in this optimization? Superficially, this acts like a bug, but I'm uncomfortable making that assertion without seeing exactly what the optimizations have done.

Thanks,
Jonathan

0 Kudos
1 Solution
jimdempseyatthecove
Honored Contributor III
5,267 Views

Consider (inside parallel region)

(I is private)

[fortran]
do I =  omp_get_thread_num() + 1, yourUpper, omp_get_num_threads()
  ...
end do
[/fortran]

Jim Dempsey

View solution in original post

0 Kudos
25 Replies
Jonathan_B_
Beginner
888 Views

Hi Jim,

Thanks for the suggestion, however the sys admins set $OMP_NUM_THREADS=1 on non-compute nodes, so the bounds on variable i match with the number of threads. Still, it was worth a try, so I inserted

[fortran] if (i > size(chunk)) cycle [/fortran]

since this is an OpenMP parallelized do loop and EXIT commands are prohibited. However, that had no effect. The program output was identical. This is why I would like to track down the internal loop iteration count variable and number of iterations calculated, but since they're not going to be in the debugging symbols I need a recommendation on how to find them.

In the production environment, this will be running with 16+ threads, but it should work with however many threads are available.

Jonathan

0 Kudos
jimdempseyatthecove
Honored Contributor III
5,268 Views

Consider (inside parallel region)

(I is private)

[fortran]
do I =  omp_get_thread_num() + 1, yourUpper, omp_get_num_threads()
  ...
end do
[/fortran]

Jim Dempsey

0 Kudos
Jonathan_B_
Beginner
888 Views

Hi Jim,

That is a beautiful modification, and it worked. My best guess is that having all bounds of the loop dependent on the environment prohibited ifort from making the assumption that caused the error in optimization.

Thanks much!

Jonathan

0 Kudos
jimdempseyatthecove
Honored Contributor III
888 Views

The above suggestion will work best when the amount of work for each I is approximately equal.

Also note, if the output needs to be in order of I then consider something like this:

volatile integer :: NextOutput

(in parallel region, NextOutput shared, I private)

NextOutput = 1 ! all threads reset
!$OMP BARRIER ! assure all threads past above
do I = omp_get_thread_num() + 1, YourUpper, omp_get_num_threads()
  ... ! parallel computational work here
  do while(NextOutput .NE. I)
     call SleepQQ(0)
  end do
  write(*,*) YourOutput
  NextOutput = NextOutput + 1
end do
[/fortranj]

Jim Dempsey
 

0 Kudos
Jonathan_B_
Beginner
888 Views

I just checked the optimization report - the loop was not unrolled. Any guess for what I should look for to identify the source of the error? I'd like to find the behavior that caused this and submit that as a bug for optimizations.

Thanks,

Jonathan

0 Kudos
Reply