Intel® C++ Compiler
Community support and assistance for creating C++ code that runs on platforms based on Intel® processors.

Optimiser weirdness

Nick_M_3
New Contributor I
495 Views

On one of my tests, some OpenMP-parallelised loops are running at twice the speed of the near-identical serial code, even on one CPU!  That rather implies that there's a optimisation which would be better in the generic section.

I have attached the matrix generation program (just provide the size of matrix as an argument - I used 1000) and the code, which just takes the matrix file name as an argument.

This is clearly a low-priority item :-)

0 Kudos
3 Replies
Nick_M_3
New Contributor I
495 Views

Sorry - that's with build 13.1.2.183.

0 Kudos
TimP
Honored Contributor III
495 Views

I didn't keep that version of the compiler.

Checking with 13.1.192, (apparently unneeded) auto-inlining can be more aggressive when -openmp is not set.

Even with -fno-inline-functions, the opt-report-file results look complicated.  OpenMP prevents some apparently counter-productive loop transformations with -O3 at source line 171 and then helps the compiler recognize dot product optimization at line 181, which is more easily recognized in the style you used at line 163 (no aliasing analysis needed to optimize, and possibly better numerical properties).

Sometimes, inner_product() notation helps, but here it seems sufficient to define a scalar accumulator. The explicit scalar accumulator would be needed in order to apply the OpenMP sum reduction, which the compiler accomplishes without the pragma when -fp-model fast is set.  I'll leave it to you to make recommendations.

0 Kudos
Nick_M_3
New Contributor I
495 Views

Thanks.  Ugh. That's definitely messy.

The code is actually just the standard LAPACK logic, converted to (originally) modern Fortran and thence to C etc.  I use it as a  way of checking out and teaching coding paradigms and SIMD parallelism.

0 Kudos
Reply