- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Hello,

I am using the icc v.16 compiler to parallelize this part of my program:

#pragma simd assert for(int i=0; i<nParticlesInUse; i++) { if (particles.id == INVALID) continue; particles.ax = GRAVITY_X; particles.ay = GRAVITY_Y; for (int j=0; j<nParticlesInUse; j++) { if (particles.id != INVALID) { double dx = particles .x - particles .x; double dy = particles.y - particles .y; double r2 = dx * dx + dy * dy; if( r2 > cutoff*cutoff ) { } else { r2 = fmax( r2, min_r*min_r ); double r = sqrt( r2 ); double coef = ( 1 - cutoff / r ) / r2 / mass; particles.ax += coef * dx; particles.ay += coef * dy; if (particles.ax*particles.ax + particles.ay*particles.ay > 10000000) { particles.ax = 0; particles.ay = 0; } } } } }

gprof indicated that this is the most used function by far.

So, the vectorization report indicates a speedup of 2.29x:

LOOP BEGIN at noc8x8.cpp(1813,6)

remark #15328: vectorization support: gather was emulated for the variable this: strided by 14 [ noc8x8.cpp(1815,7) ]

remark #15329: vectorization support: scatter was emulated for the variable this: masked, strided by 7 [ noc8x8.cpp(1817,3) ]

remark #15329: vectorization support: scatter was emulated for the variable this: masked, strided by 7 [ noc8x8.cpp(1818,3) ]

remark #15328: vectorization support: gather was emulated for the variable this: masked, strided by 7 [ noc8x8.cpp(1827,33) ]

remark #15328: vectorization support: gather was emulated for the variable this: masked, strided by 7 [ noc8x8.cpp(1828,33) ]

remark #15329: vectorization support: scatter was emulated for the variable this: masked, strided by 7 [ noc8x8.cpp(1840,5) ]

remark #15328: vectorization support: gather was emulated for the variable this: masked, strided by 7 [ noc8x8.cpp(1840,5) ]

remark #15329: vectorization support: scatter was emulated for the variable this: masked, strided by 7 [ noc8x8.cpp(1841,5) ]

remark #15328: vectorization support: gather was emulated for the variable this: masked, strided by 7 [ noc8x8.cpp(1841,5) ]

remark #15328: vectorization support: gather was emulated for the variable this: masked, strided by 7 [ noc8x8.cpp(1843,9) ]

remark #15328: vectorization support: gather was emulated for the variable this: masked, strided by 7 [ noc8x8.cpp(1843,25) ]

remark #15328: vectorization support: gather was emulated for the variable this: masked, strided by 7 [ noc8x8.cpp(1843,43) ]

remark #15328: vectorization support: gather was emulated for the variable this: masked, strided by 7 [ noc8x8.cpp(1843,59) ]

remark #15329: vectorization support: scatter was emulated for the variable this: masked, strided by 7 [ noc8x8.cpp(1845,6) ]

remark #15329: vectorization support: scatter was emulated for the variable this: masked, strided by 7 [ noc8x8.cpp(1846,6) ]

remark #15305: vectorization support: vector length 4

remark #15309: vectorization support: normalized vectorization overhead 0.019

remark #15301: SIMD LOOP WAS VECTORIZED

remark #15452: unmasked strided loads: 8

remark #15453: unmasked strided stores: 6

remark #15460: masked strided loads: 1

remark #15475: --- begin vector loop cost summary ---

remark #15476: scalar loop cost: 679

remark #15477: vector loop cost: 296.000

remark #15478: estimated potential speedup: 2.290

remark #15482: vectorized math library calls: 1

remark #15488: --- end vector loop cost summary ---

remark #25015: Estimate of max trip count of loop=15000LOOP BEGIN at noc8x8.cpp(1819,3)

remark #25460: No loop optimizations reported

remark #25015: Estimate of max trip count of loop=60000

LOOP END

LOOP END

Unfortunately, the vectorization slows down the execution: without simd: 22.26 seconds; with simd: 24.28 seconds.

Can somebody give me some pointers what is best practice to debug this behavior?

Best regards,

Tim

Link Copied

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Beyond testing a change from array of structures to structure of arrays?

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page