Solved: Trying to find out if a GPU can speed up FP calculations

Anders_S_1 · ‎03-24-2023

Hi,

I hope I have found the right forum!

I am a GPU novice, so I hope I can ask relevant questions!

I use MPI to speed up FP calculations. 40 threads seem to give maximum speed, probably due to MPI overhead. 4000 integrals have to be evaluated in each setup of a system of equations for Newton-Raphson iterations. This means that in each thread 100 integrals are evaluated.

My question is now if it could be possible to offload the work from each thread to an advanced GPU (maybe MAX 1100) to perform these 100 evaluations in parallel. The amount of data to be transferred is probably rather small and the number of mathematic functions calls during the evaluation of the integrals are also rather few.

I understand that high data transfer rates CPU->GPU are necessary. A speedup of a factor 10 should be a good result and more or less decrease the total computational time by a factor of nearly 10.

I use DP and as second order derivatives are calculated in seems necessary to have that option on the GPU, even if it may turn out that SP may do the job "good enough".

Best regards

Anders S

Dunni_A_Intel · ‎03-28-2023

Hi,

Yes, you may use MPI with SYCL or other low-level libraries that support Intel GPUs (e.g., oneMKL, OpenCL).

GPUs are massively parallel in comparison to CPUs; applications with regions that may be split into sufficiently large number of tasks to keep the compute resources busy, with the right data dependence and access patterns are likely to run faster on GPUs. However, any and how much speedup will really depend on your specific algorithm, baseline CPU, and the specific GPU you are using. Our Intel® Data Center GPU Max series support FP64.

Dunni

View solution in original post

Anders_S_1 · ‎03-24-2023

Hi again,

I forgot to tell that typically, on the CPU, the 100 evaluations per thread takes typically 5-10 seconds.

Best regards

Anders S

Dunni_A_Intel · ‎03-28-2023

Hi,

Yes, you may use MPI with SYCL or other low-level libraries that support Intel GPUs (e.g., oneMKL, OpenCL).

GPUs are massively parallel in comparison to CPUs; applications with regions that may be split into sufficiently large number of tasks to keep the compute resources busy, with the right data dependence and access patterns are likely to run faster on GPUs. However, any and how much speedup will really depend on your specific algorithm, baseline CPU, and the specific GPU you are using. Our Intel® Data Center GPU Max series support FP64.

Dunni

Anders_S_1 · ‎03-28-2023

Hi Dunni,

Thanks for your reply. Is there any document outlining how to offload a task to a GPU?

-Has the task to be recoded in the GPU language?

-Is there any documentation on how the MAX 1100 compute hardware layout looks etc.?

Best regards

Anders S