- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The page C/C++ OpenMP* and DPC++ Composability contains an example of using OpenMP to "offload" to the GPU. However when I tried using this I found that the example actually took more CPU time if it is compiled with OpenMP pragmas, than it the pragmas are removed.
I created two files openMP.cpp (exactly the example from the page) and noOpenMP.cpp (the same but with the pragmas commented out). I ran them with the following results:
ian@i3:~/openmp$ icpx -o withOpenMP -fsycl -fiopenmp -fopenmp-targets=spir64 openMP.cpp
ian@i3:~/openmp$ icpx -o withoutOpenMP -fsycl -fiopenmp -fopenmp-targets=spir64 noOpenMP.cpp
ian@i3:~/openmp$ time ./withOpenMP
Vec[512] = 512
Pi = 3.14159
real 0m0.931s
user 0m1.561s
sys 0m0.125s
ian@i3:~/openmp$ time ./withoutOpenMP
Vec[512] = 512
Pi = 3.14159
real 0m0.180s
user 0m0.155s
sys 0m0.024s
With the pragmas, the program takes 5 times as long and uses nearly 10 times as much CPU as if the pragmas are omitted. As an attempt to offload the CPU, this is spectacular failure.
Are there any examples where OpenMP can actually be used to offload the CPU?
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thanks for reaching out to us.
By default, openMP will run on the CPU.
If we want to offload it to any specific GPU target, then only we use -fiopenmp -fopenmp-targets=spir64 option to enable offloading to a specified GPU target(Linux*) explicitly.
Please see the below scenarios:
1. To enable openMP and DPC++/SYCL constructs, use the below command:
icpx -fsycl -fiopenmp -fopenmp-targets=spir64 offloadOmp_dpcpp.cpp
-fsycl option enables DPC++
-fiopenmp -fopenmp-targets=spir64 option enables OpenMP* offload for GPU
** If we do not specify any target, then the default offloading to host/CPU will be done.
2. If the code does not contain OpenMP offload, but only normal OpenMP code, use the below command.
icpx -fsycl -fiopenmp omp_dpcpp.cpp
3. If there is no openMP code, then use the below command.
icpx -fsycl noOpenMP.cpp
We tried compiling and running the OpenMP & noOpenMP codes as below. And, we observed only a minimal change in time w.r.t program and CPU for both cases.
u67125@s001-n066:~/openmp$ icpx -fsycl -fiopenmp -fopenmp-targets=spir64 openMP.cpp -o withOpenMP
u67125@s001-n066:~/openmp$ time ./withOpenMP
Vec[512] = 512
Pi = 3.14159
real 0m0.415s
user 0m4.042s
sys 0m0.198s
u67125@s001-n066:~/openmp$ icpx -fsycl noOpenMP.cpp -o withoutOpenMP
u67125@s001-n066:~/openmp$ time ./withoutOpenMP
Vec[512] = 512
Pi = 3.14159
real 0m0.301s
user 0m1.950s
sys 0m0.115s
Thanks & Regards,
Santosh
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you for your reply. I was aware of the need to use " -fiopenmp -fopenmp-targets=spir64 " to offload to the GPU. Indeed as you will see from my original question, I used those options.
Whereas you have much less difference (probably due to the different hardware), you also had more CPU used in the "offload" case, than in the pure CPU case.
My original question, "Are there any examples where OpenMP can actually be used to offload the CPU?", remains. Can you provide an example of offloading significantly reducing CPU usage?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
What is this "screenshot" that you refer to? Do you the terminal dialogue on host "u67125@s001-n066"?
In that case the "withOpenMP" case takes 4.042s + 0.198s = 4.240s CPU
whereas the "withoutOpenMP" case takes 1.950s + 0.115s = 2.065s CPU.
The with OpenMP case takes over twice as much CPU as the withoutOpenMP case. Far from offloading the CPU, it is doubling the CPU load.
If you are not referring to that terminal dialogue, what are you referring to?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
>>"If you are not referring to that terminal dialogue, what are you referring to?
I was referring to the screenshot provided in my previous post. (The screenshot might take a few seconds to be updated at your end.)
We can see from the below screenshot that using OpenMP, CPU usage has been reduced to half the time when compared to that of not using OpenMP.
Thanks & Regards,
Santosh
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
What is the hardware for this speed improvement? What CPU and what GPU?
I am using a Celeron 3965U CPU, and a Kaby Lake HD 610 (device 5906) GPU.
Even if you are achieving some CPU reduction, halving the CPU when you are offloading the complete task is pretty lame. My previous experience of offloading (about ten years ago using CUDA), I took a task that would have taken the CPU a few times over and used about 2% of CPU when offloaded to the GPU. I wasn't necessarily expecting the offload to be that good, but I was definitely hoping for a factor 10 reduction in CPU.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
My CPU & GPU details are given below:
CPU: Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz
GPU: Intel(R) UHD Graphics 630 [0x3e98]
Could you please let us know the CPU details that you have used 10 years ago?
Thanks & Regards,
Santosh
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Santosh,
I regret that I do not still have the hardware or remember precisely what it was. I have checked the project archives. It includes all of the code, but does not record the precise hardware.
The PC was a reasonably typical desktop machine of the time. The NVIDIA GPU was a mid-range one, one of the cheaper ones capable of GPGPU processing.
Sorry I cannot be more precise.
Ian
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
We are working on your issue internally and will get back to you soon.
Thanks & Regards,
Santosh
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Ian,
I've searched internally but couldn't find any examples which match what you are looking for.
Sorry for the inconvenience.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Ian,
Can we close this thread?
Thanks,
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I don't mind you closing. I cannot see this going anywhere.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
A Computer Science portal for geeks. It contains well written, well thought and well explained computer science and programming
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page