Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
29106 Discussions

Prep for migrating to OpenMP offload

jimdempseyatthecove
Honored Contributor III
336 Views

I am anticipating migrating the simulation code I have to make use of OpenMP Offload. In doing so, this may require a change in data layout.

 

Currently the candidate arrays are real(8) and allocated like:

allocate(Position(3,nPoints), Velocity(3,nPoints))

IOW Array of Structures layout.

In prior analysis, using CPU's with relatively small number of cores (8-16), AND two memory channels, this layout was as performant as Structure of Arrays: ...Position(nPoints,3)...

SoA may perform better with CPU's with more memory channels (but this is untested by me).

 

I've read on other GPU forums (e.g. nVidia) for REAL(4) AoS that the hardware performs better vectorization in multiple of 4 real(4)'s. iow allocate as Position(4,nPoints) and waste the space.

So the question is: Do intel ARC GPUs behave the same, and behave the same with REAL(8) variables?

 

The second question is, while I don't mind changing from AoS to SoA if it is warranted, I'd like not to change if it is not warranted. Warranted > 10% performance boost.

 

I know that the answer to these questions are "it depends on the code in the application", but I am looking for general guidance from past experience of other users.

 

BTW I am awaiting the availability of the next gen Battlemage GPU before code migration.

 

Jim Dempsey

0 Replies
Reply