- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I hope I have found the right forum!
I am a GPU novice, so I hope I can ask relevant questions!
I use MPI to speed up FP calculations. 40 threads seem to give maximum speed, probably due to MPI overhead. 4000 integrals have to be evaluated in each setup of a system of equations for Newton-Raphson iterations. This means that in each thread 100 integrals are evaluated.
My question is now if it could be possible to offload the work from each thread to an advanced GPU (maybe MAX 1100) to perform these 100 evaluations in parallel. The amount of data to be transferred is probably rather small and the number of mathematic functions calls during the evaluation of the integrals are also rather few.
I understand that high data transfer rates CPU->GPU are necessary. A speedup of a factor 10 should be a good result and more or less decrease the total computational time by a factor of nearly 10.
I use DP and as second order derivatives are calculated in seems necessary to have that option on the GPU, even if it may turn out that SP may do the job "good enough".
Best regards
Anders S
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Yes, you may use MPI with SYCL or other low-level libraries that support Intel GPUs (e.g., oneMKL, OpenCL).
GPUs are massively parallel in comparison to CPUs; applications with regions that may be split into sufficiently large number of tasks to keep the compute resources busy, with the right data dependence and access patterns are likely to run faster on GPUs. However, any and how much speedup will really depend on your specific algorithm, baseline CPU, and the specific GPU you are using. Our Intel® Data Center GPU Max series support FP64.
Dunni
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi again,
I forgot to tell that typically, on the CPU, the 100 evaluations per thread takes typically 5-10 seconds.
Best regards
Anders S
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Yes, you may use MPI with SYCL or other low-level libraries that support Intel GPUs (e.g., oneMKL, OpenCL).
GPUs are massively parallel in comparison to CPUs; applications with regions that may be split into sufficiently large number of tasks to keep the compute resources busy, with the right data dependence and access patterns are likely to run faster on GPUs. However, any and how much speedup will really depend on your specific algorithm, baseline CPU, and the specific GPU you are using. Our Intel® Data Center GPU Max series support FP64.
Dunni
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Dunni,
Thanks for your reply. Is there any document outlining how to offload a task to a GPU?
-Has the task to be recoded in the GPU language?
-Is there any documentation on how the MAX 1100 compute hardware layout looks etc.?
Best regards
Anders S

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page