GPU Compute Software
Ask questions about Intel® Graphics Compute software technologies, such as OpenCL* GPU driver and oneAPI Level Zero
252 Discussions

Trying to find out if a GPU can speed up FP calculations

Anders_S_1
New Contributor III
2,335 Views

Hi,

I hope I have found the right forum!

I am a GPU novice, so I hope I can ask relevant questions!

I use MPI to speed up FP calculations. 40 threads seem to give maximum speed, probably due to MPI overhead. 4000 integrals have to be evaluated in each setup of a system of equations for Newton-Raphson iterations. This means that in each thread 100 integrals are evaluated.

My question is now if it could be possible to offload the work from each thread to an advanced GPU (maybe MAX 1100) to perform these 100 evaluations in parallel. The amount of data to be transferred is probably rather small and the number of mathematic functions calls during the evaluation of the integrals are also rather few.

I understand that high data transfer rates CPU->GPU are necessary. A speedup of a factor 10 should be a good result and more or less decrease the total computational time by a factor of nearly 10.

I use DP and as second order derivatives are calculated in seems necessary to have that option on the GPU, even if it may turn out that SP may do the job "good enough".

Best regards

Anders S

0 Kudos
1 Solution
Dunni_A_Intel
Moderator
2,238 Views

Hi,


Yes, you may use MPI with SYCL or other low-level libraries that support Intel GPUs (e.g., oneMKL, OpenCL).


GPUs are massively parallel in comparison to CPUs; applications with regions that may be split into sufficiently large number of tasks to keep the compute resources busy, with the right data dependence and access patterns are likely to run faster on GPUs. However, any and how much speedup will really depend on your specific algorithm, baseline CPU, and the specific GPU you are using. Our Intel® Data Center GPU Max series support FP64.


Dunni


View solution in original post

0 Kudos
3 Replies
Anders_S_1
New Contributor III
2,325 Views

Hi again,

I forgot to tell that typically, on the CPU, the 100 evaluations per thread takes typically 5-10 seconds.

Best regards

Anders S

0 Kudos
Dunni_A_Intel
Moderator
2,239 Views

Hi,


Yes, you may use MPI with SYCL or other low-level libraries that support Intel GPUs (e.g., oneMKL, OpenCL).


GPUs are massively parallel in comparison to CPUs; applications with regions that may be split into sufficiently large number of tasks to keep the compute resources busy, with the right data dependence and access patterns are likely to run faster on GPUs. However, any and how much speedup will really depend on your specific algorithm, baseline CPU, and the specific GPU you are using. Our Intel® Data Center GPU Max series support FP64.


Dunni


0 Kudos
Anders_S_1
New Contributor III
2,232 Views

Hi Dunni,

Thanks for your reply. Is there any document outlining how to offload a task to a GPU?

-Has the task to be recoded in the GPU language?

-Is there any documentation on how the MAX 1100 compute hardware layout looks etc.?

Best regards

Anders S

 

0 Kudos
Reply