topic Performance problem of MonteCarlo integration in Intel® oneAPI DPC++/C++ Compiler

Performance problem of MonteCarlo integration

SandeepKoranne — Sat, 08 May 2021 19:18:39 GMT

Hello

I am comparing the runtime performance of a simple sample/reject
Monte-Carlo integration scheme.

The program is run on the following computer
model name : Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz

The code is attached to this report.
./M1.exe M (number of polynomials) N (number of trials) T(number threads)

With 32 OpenMP threads the DPCPP compiled program is approximately 3 times slower
than the one compiled with GCC.

dpcpp -O3 -fopenmp -Wall -funroll-loops -ffast-math monte_carlo_integration.cpp -o MC_DPCPP.exe
time ./MC_DPCPP.exe 1000 10000000 32 > /dev/null

real 1m17.954s
user 36m22.515s
sys 0m0.401s

g++ -O3 -fopenmp -funroll-loops -ffast-math -fprofile-use monte_carlo_integration.cpp -o M1.exe
GCC 11.1
./M1_GCC111.exe 1000 10000000 32 > /dev/null

real 0m23.694s
user 11m8.420s
sys 0m0.019s

GCC 8.3.1
time ./MC_POLY 1000 10000000 32 > /dev/null

real 0m26.024s
user 12m21.249s
sys 0m0.020s

Running perf stat on the two binaries gives

GCC 8.3.1
Performance counter stats for './MC_POLY 10 10000000 1':

5,619.33 msec task-clock:u # 1.000 CPUs utilized
0 context-switches:u # 0.000 K/sec
0 cpu-migrations:u # 0.000 K/sec
167 page-faults:u # 0.030 K/sec
17,887,915,840 cycles:u # 3.183 GHz
30,678,358,310 instructions:u # 1.72 insn per cycle
4,101,348,797 branches:u # 729.864 M/sec
226,816,326 branch-misses:u # 5.53% of all branches

5.620014706 seconds time elapsed

5.609363000 seconds user
0.001990000 seconds sys

Performance counter stats for './MC_DPCPP.exe 10 10000000 1':

15,906.43 msec task-clock:u # 1.000 CPUs utilized
0 context-switches:u # 0.000 K/sec
0 cpu-migrations:u # 0.000 K/sec
651 page-faults:u # 0.041 K/sec
49,488,407,800 cycles:u # 3.111 GHz
82,102,796,894 instructions:u # 1.66 insn per cycle
6,603,124,397 branches:u # 415.123 M/sec
3,099,931 branch-misses:u # 0.05% of all branches

15.911192960 seconds time elapsed

Re:Performance problem of MonteCarlo integration

VidyalathaB_Intel — Mon, 10 May 2021 08:17:22 GMT

Hi Sandeep,

Thanks for reaching out to us.

Could you please provide us the details of DPC++ compiler version on which you are working?

Meanwhile we will look into this issue internally. we will get back to you soon.

Regards,

Vidya.

Re: Performance problem of MonteCarlo integration

SandeepKoranne — Mon, 10 May 2021 14:25:14 GMT

Thanks Vidya

Intel(R) oneAPI DPC++ Compiler 2021.2.0 (2021.2.0.20210317)
Target: x86_64-unknown-linux-gnu

This is the version I am using.

Regards,

Sandeep

Re:Performance problem of MonteCarlo integration

Viet_H_Intel — Tue, 11 May 2021 12:52:55 GMT

Hi Sandeep,

I've reported this problem to our Developer.

Thanks,

Re: Performance problem of MonteCarlo integration

SandeepKoranne — Sat, 22 May 2021 19:15:08 GMT

Is there any update to this issue ?

Even single threaded performance is much (3x) slower than gcc. Is this due to LLVM not able to optimize lambda[] functions ?

Sandeep

Re:Performance problem of MonteCarlo integration

Viet_H_Intel — Tue, 25 May 2021 20:41:36 GMT

Sorry, we don't have any update yet on this issue.

Re:Performance problem of MonteCarlo integration

Viet_H_Intel — Mon, 13 Jun 2022 17:08:29 GMT

Hi,

This issue has been addressed. The next update will show icpx is much faster -fiopenmp.

Thanks,

Re:Performance problem of MonteCarlo integration

Viet_H_Intel — Tue, 04 Oct 2022 00:11:05 GMT

Please upgrade to oneAPI2022.3 which addressed this issue.

I am going to close this thread.

Thanks,