Hi all
I found dpcpp program performance gets drop by adding compile option '-g'.
And I also test the related cuda program, nvcc does not have this problem.
The program source code has been uploaded.
The compile cmdline is :
dpcpp -o a -O2 ./convSep_nocg.dp.cpp dpcpp -o a_g -g -O2 ./convSep_nocg.dp.cpp
Program 'a_g' spend near 2x time vs 'a'.
# ./a 10240 10240 1000 [./a] - Starting... Image Width x Height = 10240 x 10240 Allocating and initializing host arrays... Allocating and initializing CUDA arrays... Running GPU convolution (1000 identical iterations)... convolutionSeparable, Throughput = 1321.5942 MPixels/sec, Time = 0.07934 s, Size = 104857600 Pixels, NumDevsUsed = 1, Workgroup = 0 Reading back GPU results... Checking the results... ...running convolutionRowCPU() ...running convolutionColumnCPU() ...comparing the results ...Relative L2 norm: 0.000000E+00 Shutting down... # ./a_g 10240 10240 1000 [./a_g] - Starting... Image Width x Height = 10240 x 10240 Allocating and initializing host arrays... Allocating and initializing CUDA arrays... Running GPU convolution (1000 identical iterations)... convolutionSeparable, Throughput = 773.0526 MPixels/sec, Time = 0.13564 s, Size = 104857600 Pixels, NumDevsUsed = 1, Workgroup = 0 Reading back GPU results... Checking the results... ...running convolutionRowCPU() ...running convolutionColumnCPU() ...comparing the results ...Relative L2 norm: 0.000000E+00 Shutting down...
OS Version: Ubuntu 18.04.3 LTS
linux-kernel: 4.15.18
oneAPI Basekit Version: 2021.1-beta06
CPU: Intel(R) Xeon(R) CPU E3-1585 v5 @ 3.50GHz
GPU: Intel Corporation Iris Pro Graphics P580
Link Copied
Hi Jim,
When we use -g debug flag in dpcpp, it generates debug information for both host as well as device part of the code.
Enabling -g option creates another section called debug section. So this will, in turn, create overhead during compilation, hence there could be a considerable increase during run time.
For more information, you refer to this link.
However, when -g flag is passed to nvcc compiler it generates debug information only for the host. To generate debug information for the device there is a different flag that needs to be passed to the nvcc compiler.
Regards
Prasanth
Hi Jim,
Could you please let us know if your issue is resolved.
If not do let us know. So that we will be able to help you regarding the same.
Regards
--Goutham
Hi Jim,
This issue has been resolved and we will no longer respond to this thread. If you require additional assistance from Intel, please start a new thread. Any further interaction in this thread will be considered community only
Regards
Prasanth
For more complete information about compiler optimizations, see our Optimization Notice.