Intel® oneAPI DPC++/C++ Compiler
Talk to fellow users of Intel® oneAPI DPC++/C++ Compiler and companion tools like Intel® oneAPI DPC++ Library, Intel® DPC++ Compatibility Tool, and Intel® Distribution for GDB*
Announcements
FPGA community forums and blogs on community.intel.com are migrating to the new Altera Community and are read-only. For urgent support needs during this transition, please visit the FPGA Design Resources page or contact an Altera Authorized Distributor.
806 Discussions

Performance problem of MonteCarlo integration

SandeepKoranne
Beginner
3,035 Views

Hello

I am comparing the runtime performance of a simple sample/reject
Monte-Carlo integration scheme.

The program is run on the following computer
model name : Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz

The code is attached to this report.
./M1.exe M (number of polynomials) N (number of trials) T(number threads)

With 32 OpenMP threads the DPCPP compiled program is approximately 3 times slower
than the one compiled with GCC.

dpcpp -O3 -fopenmp -Wall -funroll-loops -ffast-math monte_carlo_integration.cpp -o MC_DPCPP.exe
time ./MC_DPCPP.exe 1000 10000000 32 > /dev/null

real 1m17.954s
user 36m22.515s
sys 0m0.401s

g++ -O3 -fopenmp -funroll-loops -ffast-math -fprofile-use monte_carlo_integration.cpp -o M1.exe
GCC 11.1
./M1_GCC111.exe 1000 10000000 32 > /dev/null

real 0m23.694s
user 11m8.420s
sys 0m0.019s

GCC 8.3.1
time ./MC_POLY 1000 10000000 32 > /dev/null

real 0m26.024s
user 12m21.249s
sys 0m0.020s

 

Running perf stat on the two binaries gives

 

GCC 8.3.1
Performance counter stats for './MC_POLY 10 10000000 1':

5,619.33 msec task-clock:u # 1.000 CPUs utilized
0 context-switches:u # 0.000 K/sec
0 cpu-migrations:u # 0.000 K/sec
167 page-faults:u # 0.030 K/sec
17,887,915,840 cycles:u # 3.183 GHz
30,678,358,310 instructions:u # 1.72 insn per cycle
4,101,348,797 branches:u # 729.864 M/sec
226,816,326 branch-misses:u # 5.53% of all branches

5.620014706 seconds time elapsed

5.609363000 seconds user
0.001990000 seconds sys

 

Performance counter stats for './MC_DPCPP.exe 10 10000000 1':

15,906.43 msec task-clock:u # 1.000 CPUs utilized
0 context-switches:u # 0.000 K/sec
0 cpu-migrations:u # 0.000 K/sec
651 page-faults:u # 0.041 K/sec
49,488,407,800 cycles:u # 3.111 GHz
82,102,796,894 instructions:u # 1.66 insn per cycle
6,603,124,397 branches:u # 415.123 M/sec
3,099,931 branch-misses:u # 0.05% of all branches

15.911192960 seconds time elapsed

Labels (2)
0 Kudos
7 Replies
VidyalathaB_Intel
Moderator
2,998 Views

Hi Sandeep,

Thanks for reaching out to us.

Could you please provide us the details of DPC++ compiler version on which you are working?

Meanwhile we will look into this issue internally. we will get back to you soon.

Regards,

Vidya.


0 Kudos
SandeepKoranne
Beginner
2,975 Views

Thanks Vidya

Intel(R) oneAPI DPC++ Compiler 2021.2.0 (2021.2.0.20210317)
Target: x86_64-unknown-linux-gnu

This is the version I am using.

 

Regards,

Sandeep

0 Kudos
Viet_H_Intel
Moderator
2,956 Views

Hi Sandeep,


I've reported this problem to our Developer.

Thanks,



0 Kudos
SandeepKoranne
Beginner
2,894 Views

Hi 

Is there any update to this issue ?

Even single threaded performance is much (3x) slower than gcc. Is this due to LLVM not able to optimize lambda[] functions ?

Sandeep

0 Kudos
Viet_H_Intel
Moderator
2,877 Views

Sorry, we don't have any update yet on this issue.


0 Kudos
Viet_H_Intel
Moderator
2,471 Views

Hi,


This issue has been addressed. The next update will show icpx is much faster -fiopenmp.


Thanks,



0 Kudos
Viet_H_Intel
Moderator
2,286 Views

Please upgrade to oneAPI2022.3 which addressed this issue.

I am going to close this thread.

Thanks,


0 Kudos
Reply