Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
28383 Discussions

optimization report on stencil - loads not optimized

putzu__roberto
Beginner
193 Views

Hello,

I'm using ifort to compile a scientific code with openmp parallelization, the relevant code's section is :

!$omp parallel do schedule(static,1)

  do j = 2,n-1

    do i = 2, m - 1

     w(i,j) = 0.25 * (u(i-1,j)+u(i+1,j)+u(i,j-1)+u(i,j+1)  )
     
    end do
  end do

!$omp end parallel do

where n and m are very big.

I compile with : ifort -o stencil  -qopenmp -Ofast -fno-alias -qopt-report  stencil.f90

The interesting part of report is this :

stencil.f90(#linewithstencil,23):remark #34055: adjacent dense (unit-strided stencil) loads are not optimized. Details: stride { 4 }, step { 8 }, types { F32-V128, F32-V128 }, number of elements { 4 }, select mask { 0x000000003 }.
stencil.f90(#linewithstencil,23):remark #34055: adjacent dense (unit-strided stencil) loads are not optimized. Details: stride { 4 }, step { 8 }, types { F32-V128, F32-V128 }, number of elements { 4 }, select mask { 0x000000003 }.
stencil.f90(#linewithstencil,23):remark #34055: adjacent dense (unit-strided stencil) loads are not optimized. Details: stride { 4 }, step { 8 }, types { F32-V128, F32-V128 }, number of elements { 4 }, select mask { 0x000000003 }.
stencil.f90#linewithstencil,23):remark #34055: adjacent dense (unit-strided stencil) loads are not optimized. Details: stride { 4 }, step { 8 }, types { F32-V128, F32-V128 }, number of elements { 4 }, select mask { 0x000000003 }.

Is compiler suggesting some improvement for perfomance ? I must say speedup is pretty bad, at least on my laptop. Thank you
 

0 Kudos
0 Replies
Reply