Intel® oneAPI Data Parallel C++
Support for Intel® oneAPI DPC++ Compiler, Intel® oneAPI DPC++ Library, Intel ICX Compiler , Intel® DPC++ Compatibility Tool, and GDB*
584 Discussions

Performance problem of MonteCarlo integration

SandeepKoranne
Beginner
1,697 Views

Hello

I am comparing the runtime performance of a simple sample/reject
Monte-Carlo integration scheme.

The program is run on the following computer
model name : Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz

The code is attached to this report.
./M1.exe M (number of polynomials) N (number of trials) T(number threads)

With 32 OpenMP threads the DPCPP compiled program is approximately 3 times slower
than the one compiled with GCC.

dpcpp -O3 -fopenmp -Wall -funroll-loops -ffast-math monte_carlo_integration.cpp -o MC_DPCPP.exe
time ./MC_DPCPP.exe 1000 10000000 32 > /dev/null

real 1m17.954s
user 36m22.515s
sys 0m0.401s

g++ -O3 -fopenmp -funroll-loops -ffast-math -fprofile-use monte_carlo_integration.cpp -o M1.exe
GCC 11.1
./M1_GCC111.exe 1000 10000000 32 > /dev/null

real 0m23.694s
user 11m8.420s
sys 0m0.019s

GCC 8.3.1
time ./MC_POLY 1000 10000000 32 > /dev/null

real 0m26.024s
user 12m21.249s
sys 0m0.020s

 

Running perf stat on the two binaries gives

 

GCC 8.3.1
Performance counter stats for './MC_POLY 10 10000000 1':

5,619.33 msec task-clock:u # 1.000 CPUs utilized
0 context-switches:u # 0.000 K/sec
0 cpu-migrations:u # 0.000 K/sec
167 page-faults:u # 0.030 K/sec
17,887,915,840 cycles:u # 3.183 GHz
30,678,358,310 instructions:u # 1.72 insn per cycle
4,101,348,797 branches:u # 729.864 M/sec
226,816,326 branch-misses:u # 5.53% of all branches

5.620014706 seconds time elapsed

5.609363000 seconds user
0.001990000 seconds sys

 

Performance counter stats for './MC_DPCPP.exe 10 10000000 1':

15,906.43 msec task-clock:u # 1.000 CPUs utilized
0 context-switches:u # 0.000 K/sec
0 cpu-migrations:u # 0.000 K/sec
651 page-faults:u # 0.041 K/sec
49,488,407,800 cycles:u # 3.111 GHz
82,102,796,894 instructions:u # 1.66 insn per cycle
6,603,124,397 branches:u # 415.123 M/sec
3,099,931 branch-misses:u # 0.05% of all branches

15.911192960 seconds time elapsed

Labels (2)
0 Kudos
7 Replies
VidyalathaB_Intel
Moderator
1,660 Views

Hi Sandeep,

Thanks for reaching out to us.

Could you please provide us the details of DPC++ compiler version on which you are working?

Meanwhile we will look into this issue internally. we will get back to you soon.

Regards,

Vidya.


0 Kudos
SandeepKoranne
Beginner
1,637 Views

Thanks Vidya

Intel(R) oneAPI DPC++ Compiler 2021.2.0 (2021.2.0.20210317)
Target: x86_64-unknown-linux-gnu

This is the version I am using.

 

Regards,

Sandeep

0 Kudos
Viet_H_Intel
Moderator
1,618 Views

Hi Sandeep,


I've reported this problem to our Developer.

Thanks,



0 Kudos
SandeepKoranne
Beginner
1,556 Views

Hi 

Is there any update to this issue ?

Even single threaded performance is much (3x) slower than gcc. Is this due to LLVM not able to optimize lambda[] functions ?

Sandeep

0 Kudos
Viet_H_Intel
Moderator
1,539 Views

Sorry, we don't have any update yet on this issue.


0 Kudos
Viet_H_Intel
Moderator
1,133 Views

Hi,


This issue has been addressed. The next update will show icpx is much faster -fiopenmp.


Thanks,



0 Kudos
Viet_H_Intel
Moderator
948 Views

Please upgrade to oneAPI2022.3 which addressed this issue.

I am going to close this thread.

Thanks,


0 Kudos
Reply