Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
Announcements
Welcome to the Intel Community. If you get an answer you like, please mark it as an Accepted Solution to help others. Thank you!
26750 Discussions

optimization report on stencil - loads not optimized

putzu__roberto
Beginner
85 Views

Hello,

I'm using ifort to compile a scientific code with openmp parallelization, the relevant code's section is :

!$omp parallel do schedule(static,1)

  do j = 2,n-1

    do i = 2, m - 1

     w(i,j) = 0.25 * (u(i-1,j)+u(i+1,j)+u(i,j-1)+u(i,j+1)  )
     
    end do
  end do

!$omp end parallel do

where n and m are very big.

I compile with : ifort -o stencil  -qopenmp -Ofast -fno-alias -qopt-report  stencil.f90

The interesting part of report is this :

stencil.f90(#linewithstencil,23):remark #34055: adjacent dense (unit-strided stencil) loads are not optimized. Details: stride { 4 }, step { 8 }, types { F32-V128, F32-V128 }, number of elements { 4 }, select mask { 0x000000003 }.
stencil.f90(#linewithstencil,23):remark #34055: adjacent dense (unit-strided stencil) loads are not optimized. Details: stride { 4 }, step { 8 }, types { F32-V128, F32-V128 }, number of elements { 4 }, select mask { 0x000000003 }.
stencil.f90(#linewithstencil,23):remark #34055: adjacent dense (unit-strided stencil) loads are not optimized. Details: stride { 4 }, step { 8 }, types { F32-V128, F32-V128 }, number of elements { 4 }, select mask { 0x000000003 }.
stencil.f90#linewithstencil,23):remark #34055: adjacent dense (unit-strided stencil) loads are not optimized. Details: stride { 4 }, step { 8 }, types { F32-V128, F32-V128 }, number of elements { 4 }, select mask { 0x000000003 }.

Is compiler suggesting some improvement for perfomance ? I must say speedup is pretty bad, at least on my laptop. Thank you
 

0 Kudos
0 Replies
Reply