Intel® oneAPI Data Parallel C++
Support for Intel® oneAPI DPC++ Compiler, Intel® oneAPI DPC++ Library, Intel® DPC++ Compatibility Tool, and GDB*
Announcements
The Intel sign-in experience has changed to support enhanced security controls. If you sign in, click here for more information.
463 Discussions

Performance problem of MonteCarlo integration

SandeepKoranne
Beginner
1,220 Views

Hello

I am comparing the runtime performance of a simple sample/reject
Monte-Carlo integration scheme.

The program is run on the following computer
model name : Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz

The code is attached to this report.
./M1.exe M (number of polynomials) N (number of trials) T(number threads)

With 32 OpenMP threads the DPCPP compiled program is approximately 3 times slower
than the one compiled with GCC.

dpcpp -O3 -fopenmp -Wall -funroll-loops -ffast-math monte_carlo_integration.cpp -o MC_DPCPP.exe
time ./MC_DPCPP.exe 1000 10000000 32 > /dev/null

real 1m17.954s
user 36m22.515s
sys 0m0.401s

g++ -O3 -fopenmp -funroll-loops -ffast-math -fprofile-use monte_carlo_integration.cpp -o M1.exe
GCC 11.1
./M1_GCC111.exe 1000 10000000 32 > /dev/null

real 0m23.694s
user 11m8.420s
sys 0m0.019s

GCC 8.3.1
time ./MC_POLY 1000 10000000 32 > /dev/null

real 0m26.024s
user 12m21.249s
sys 0m0.020s

 

Running perf stat on the two binaries gives

 

GCC 8.3.1
Performance counter stats for './MC_POLY 10 10000000 1':

5,619.33 msec task-clock:u # 1.000 CPUs utilized
0 context-switches:u # 0.000 K/sec
0 cpu-migrations:u # 0.000 K/sec
167 page-faults:u # 0.030 K/sec
17,887,915,840 cycles:u # 3.183 GHz
30,678,358,310 instructions:u # 1.72 insn per cycle
4,101,348,797 branches:u # 729.864 M/sec
226,816,326 branch-misses:u # 5.53% of all branches

5.620014706 seconds time elapsed

5.609363000 seconds user
0.001990000 seconds sys

 

Performance counter stats for './MC_DPCPP.exe 10 10000000 1':

15,906.43 msec task-clock:u # 1.000 CPUs utilized
0 context-switches:u # 0.000 K/sec
0 cpu-migrations:u # 0.000 K/sec
651 page-faults:u # 0.041 K/sec
49,488,407,800 cycles:u # 3.111 GHz
82,102,796,894 instructions:u # 1.66 insn per cycle
6,603,124,397 branches:u # 415.123 M/sec
3,099,931 branch-misses:u # 0.05% of all branches

15.911192960 seconds time elapsed

Labels (2)
0 Kudos
7 Replies
VidyalathaB_Intel
Moderator
1,183 Views

Hi Sandeep,

Thanks for reaching out to us.

Could you please provide us the details of DPC++ compiler version on which you are working?

Meanwhile we will look into this issue internally. we will get back to you soon.

Regards,

Vidya.


SandeepKoranne
Beginner
1,160 Views

Thanks Vidya

Intel(R) oneAPI DPC++ Compiler 2021.2.0 (2021.2.0.20210317)
Target: x86_64-unknown-linux-gnu

This is the version I am using.

 

Regards,

Sandeep

Viet_H_Intel
Moderator
1,141 Views

Hi Sandeep,


I've reported this problem to our Developer.

Thanks,



SandeepKoranne
Beginner
1,079 Views

Hi 

Is there any update to this issue ?

Even single threaded performance is much (3x) slower than gcc. Is this due to LLVM not able to optimize lambda[] functions ?

Sandeep

Viet_H_Intel
Moderator
1,062 Views

Sorry, we don't have any update yet on this issue.


Viet_H_Intel
Moderator
656 Views

Hi,


This issue has been addressed. The next update will show icpx is much faster -fiopenmp.


Thanks,



Viet_H_Intel
Moderator
471 Views

Please upgrade to oneAPI2022.3 which addressed this issue.

I am going to close this thread.

Thanks,


Reply