OpenMP example

Ian_Miller · ‎12-06-2021

The page C/C++ OpenMP* and DPC++ Composability contains an example of using OpenMP to "offload" to the GPU. However when I tried using this I found that the example actually took more CPU time if it is compiled with OpenMP pragmas, than it the pragmas are removed.

I created two files openMP.cpp (exactly the example from the page) and noOpenMP.cpp (the same but with the pragmas commented out). I ran them with the following results:

ian@i3:~/openmp$ icpx -o withOpenMP -fsycl -fiopenmp -fopenmp-targets=spir64 openMP.cpp 
ian@i3:~/openmp$ icpx -o withoutOpenMP -fsycl -fiopenmp -fopenmp-targets=spir64 noOpenMP.cpp 
ian@i3:~/openmp$ time ./withOpenMP 
Vec[512] = 512
Pi = 3.14159

real	0m0.931s
user	0m1.561s
sys	0m0.125s
ian@i3:~/openmp$ time ./withoutOpenMP 
Vec[512] = 512
Pi = 3.14159

real	0m0.180s
user	0m0.155s
sys	0m0.024s

With the pragmas, the program takes 5 times as long and uses nearly 10 times as much CPU as if the pragmas are omitted. As an attempt to offload the CPU, this is spectacular failure.

Are there any examples where OpenMP can actually be used to offload the CPU?

SantoshY_Intel · ‎12-07-2021

Hi,

Thanks for reaching out to us.

By default, openMP will run on the CPU.

If we want to offload it to any specific GPU target, then only we use -fiopenmp -fopenmp-targets=spir64 option to enable offloading to a specified GPU target(Linux*) explicitly.

Please see the below scenarios:

1. To enable openMP and DPC++/SYCL constructs, use the below command:

icpx -fsycl -fiopenmp -fopenmp-targets=spir64 offloadOmp_dpcpp.cpp

-fsycl option enables DPC++

-fiopenmp -fopenmp-targets=spir64 option enables OpenMP* offload for GPU
** If we do not specify any target, then the default offloading to host/CPU will be done.

2. If the code does not contain OpenMP offload, but only normal OpenMP code, use the below command.

icpx -fsycl -fiopenmp omp_dpcpp.cpp

3. If there is no openMP code, then use the below command.

icpx -fsycl noOpenMP.cpp

We tried compiling and running the OpenMP & noOpenMP codes as below. And, we observed only a minimal change in time w.r.t program and CPU for both cases.

u67125@s001-n066:~/openmp$ icpx -fsycl -fiopenmp -fopenmp-targets=spir64 openMP.cpp -o withOpenMP
u67125@s001-n066:~/openmp$ time ./withOpenMP
Vec[512] = 512
Pi = 3.14159

real 0m0.415s
user 0m4.042s
sys 0m0.198s
u67125@s001-n066:~/openmp$ icpx -fsycl noOpenMP.cpp -o withoutOpenMP
u67125@s001-n066:~/openmp$ time ./withoutOpenMP
Vec[512] = 512
Pi = 3.14159

real 0m0.301s
user 0m1.950s
sys 0m0.115s

Thanks & Regards,

Santosh

Ian_Miller · ‎12-07-2021

Thank you for your reply. I was aware of the need to use " -fiopenmp -fopenmp-targets=spir64 " to offload to the GPU. Indeed as you will see from my original question, I used those options.

Whereas you have much less difference (probably due to the different hardware), you also had more CPU used in the "offload" case, than in the pure CPU case.

My original question, "Are there any examples where OpenMP can actually be used to offload the CPU?", remains. Can you provide an example of offloading significantly reducing CPU usage?

SantoshY_Intel · ‎12-08-2021

Hi,

Please refer to the example code attached below.

We can see from the below screenshot that using OpenMP, CPU usage has been reduced to half the time when compared to that of not using OpenMP.

Thanks & Regards,

Santosh

Ian_Miller · ‎12-08-2021

What is this "screenshot" that you refer to? Do you the terminal dialogue on host "u67125@s001-n066"?

In that case the "withOpenMP" case takes 4.042s + 0.198s = 4.240s CPU

whereas the "withoutOpenMP" case takes 1.950s + 0.115s = 2.065s CPU.

The with OpenMP case takes over twice as much CPU as the withoutOpenMP case. Far from offloading the CPU, it is doubling the CPU load.

If you are not referring to that terminal dialogue, what are you referring to?

SantoshY_Intel · ‎12-08-2021

Hi,

>>"If you are not referring to that terminal dialogue, what are you referring to?

I was referring to the screenshot provided in my previous post. (The screenshot might take a few seconds to be updated at your end.)

We can see from the below screenshot that using OpenMP, CPU usage has been reduced to half the time when compared to that of not using OpenMP.

Thanks & Regards,

Santosh

Ian_Miller · ‎12-08-2021

What is the hardware for this speed improvement? What CPU and what GPU?

I am using a Celeron 3965U CPU, and a Kaby Lake HD 610 (device 5906) GPU.

Even if you are achieving some CPU reduction, halving the CPU when you are offloading the complete task is pretty lame. My previous experience of offloading (about ten years ago using CUDA), I took a task that would have taken the CPU a few times over and used about 2% of CPU when offloaded to the GPU. I wasn't necessarily expecting the offload to be that good, but I was definitely hoping for a factor 10 reduction in CPU.

SantoshY_Intel · ‎12-14-2021

Hi,

My CPU & GPU details are given below:

CPU: Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz

GPU: Intel(R) UHD Graphics 630 [0x3e98]

Could you please let us know the CPU details that you have used 10 years ago?

Thanks & Regards,

Santosh

Ian_Miller · ‎12-15-2021

Hi Santosh,

I regret that I do not still have the hardware or remember precisely what it was. I have checked the project archives. It includes all of the code, but does not record the precise hardware.

The PC was a reasonably typical desktop machine of the time. The NVIDIA GPU was a mid-range one, one of the cheaper ones capable of GPGPU processing.

Sorry I cannot be more precise.

Ian

SantoshY_Intel · ‎12-20-2021

Hi,

We are working on your issue internally and will get back to you soon.

Thanks & Regards,

Santosh

Viet_H_Intel · ‎03-15-2022

Hi Ian,

I've searched internally but couldn't find any examples which match what you are looking for.

Sorry for the inconvenience.

Viet_H_Intel · ‎08-17-2022

Hi Ian,

Can we close this thread?

Thanks,

Ian_Miller · ‎08-19-2022

I don't mind you closing. I cannot see this going anywhere.

paanjii2 · ‎09-19-2022

A Computer Science portal for geeks. It contains well written, well thought and well explained computer science and programming