Community
cancel
Showing results for
Did you mean: Beginner
68 Views

## Analyzing missing vectorization speedup

Hello,

I am using the icc v.16 compiler to parallelize this part of my program:

```#pragma simd assert
for(int i=0; i<nParticlesInUse; i++) {
if (particles.id == INVALID)
continue;
particles.ax = GRAVITY_X;
particles.ay = GRAVITY_Y;

for (int j=0; j<nParticlesInUse; j++) {
if (particles.id != INVALID) {
double dx = particles.x - particles.x;
double dy = particles.y - particles.y;
double r2 = dx * dx + dy * dy;
if( r2 > cutoff*cutoff ) {

} else {

r2 = fmax( r2, min_r*min_r );
double r = sqrt( r2 );
double coef = ( 1 - cutoff / r ) / r2 / mass;
particles.ax += coef * dx;
particles.ay += coef * dy;

if (particles.ax*particles.ax + particles.ay*particles.ay > 10000000) {
particles.ax = 0;
particles.ay = 0;
}
}
}
}
}```

gprof indicated that this is the most used function by far.
So, the vectorization report indicates a speedup of 2.29x:

LOOP BEGIN at noc8x8.cpp(1813,6)
remark #15328: vectorization support: gather was emulated for the variable this:  strided by 14   [ noc8x8.cpp(1815,7) ]
remark #15329: vectorization support: scatter was emulated for the variable this: masked, strided by 7   [ noc8x8.cpp(1817,3) ]
remark #15329: vectorization support: scatter was emulated for the variable this: masked, strided by 7   [ noc8x8.cpp(1818,3) ]
remark #15328: vectorization support: gather was emulated for the variable this: masked, strided by 7   [ noc8x8.cpp(1827,33) ]
remark #15328: vectorization support: gather was emulated for the variable this: masked, strided by 7   [ noc8x8.cpp(1828,33) ]
remark #15329: vectorization support: scatter was emulated for the variable this: masked, strided by 7   [ noc8x8.cpp(1840,5) ]
remark #15328: vectorization support: gather was emulated for the variable this: masked, strided by 7   [ noc8x8.cpp(1840,5) ]
remark #15329: vectorization support: scatter was emulated for the variable this: masked, strided by 7   [ noc8x8.cpp(1841,5) ]
remark #15328: vectorization support: gather was emulated for the variable this: masked, strided by 7   [ noc8x8.cpp(1841,5) ]
remark #15328: vectorization support: gather was emulated for the variable this: masked, strided by 7   [ noc8x8.cpp(1843,9) ]
remark #15328: vectorization support: gather was emulated for the variable this: masked, strided by 7   [ noc8x8.cpp(1843,25) ]
remark #15328: vectorization support: gather was emulated for the variable this: masked, strided by 7   [ noc8x8.cpp(1843,43) ]
remark #15328: vectorization support: gather was emulated for the variable this: masked, strided by 7   [ noc8x8.cpp(1843,59) ]
remark #15329: vectorization support: scatter was emulated for the variable this: masked, strided by 7   [ noc8x8.cpp(1845,6) ]
remark #15329: vectorization support: scatter was emulated for the variable this: masked, strided by 7   [ noc8x8.cpp(1846,6) ]
remark #15305: vectorization support: vector length 4
remark #15309: vectorization support: normalized vectorization overhead 0.019
remark #15301: SIMD LOOP WAS VECTORIZED
remark #15453: unmasked strided stores: 6
remark #15475: --- begin vector loop cost summary ---
remark #15476: scalar loop cost: 679
remark #15477: vector loop cost: 296.000
remark #15478: estimated potential speedup: 2.290
remark #15482: vectorized math library calls: 1
remark #15488: --- end vector loop cost summary ---
remark #25015: Estimate of max trip count of loop=15000

LOOP BEGIN at noc8x8.cpp(1819,3)
remark #25460: No loop optimizations reported
remark #25015: Estimate of max trip count of loop=60000
LOOP END
LOOP END

Unfortunately, the vectorization slows down the execution: without simd: 22.26 seconds; with simd: 24.28 seconds.
Can somebody give me some pointers what is best practice to debug this behavior?

Best regards,

Tim  