optimization report on stencil - loads not optimized

putzu__roberto · ‎08-30-2019

Hello,

I'm using ifort to compile a scientific code with openmp parallelization, the relevant code's section is :

!$omp parallel do schedule(static,1)

do j = 2,n-1

do i = 2, m - 1

     w(i,j) = 0.25 * (u(i-1,j)+u(i+1,j)+u(i,j-1)+u(i,j+1) )

    end do
end do

!$omp end parallel do

where n and m are very big.

I compile with : ifort -o stencil -qopenmp -Ofast -fno-alias -qopt-report stencil.f90

The interesting part of report is this :

stencil.f90(#linewithstencil,23):remark #34055: adjacent dense (unit-strided stencil) loads are not optimized. Details: stride { 4 }, step { 8 }, types { F32-V128, F32-V128 }, number of elements { 4 }, select mask { 0x000000003 }.
stencil.f90(#linewithstencil,23):remark #34055: adjacent dense (unit-strided stencil) loads are not optimized. Details: stride { 4 }, step { 8 }, types { F32-V128, F32-V128 }, number of elements { 4 }, select mask { 0x000000003 }.
stencil.f90(#linewithstencil,23):remark #34055: adjacent dense (unit-strided stencil) loads are not optimized. Details: stride { 4 }, step { 8 }, types { F32-V128, F32-V128 }, number of elements { 4 }, select mask { 0x000000003 }.
stencil.f90#linewithstencil,23):remark #34055: adjacent dense (unit-strided stencil) loads are not optimized. Details: stride { 4 }, step { 8 }, types { F32-V128, F32-V128 }, number of elements { 4 }, select mask { 0x000000003 }.

Is compiler suggesting some improvement for perfomance ? I must say speedup is pretty bad, at least on my laptop. Thank you