- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello
I am comparing the runtime performance of a simple sample/reject
Monte-Carlo integration scheme.
The program is run on the following computer
model name : Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
The code is attached to this report.
./M1.exe M (number of polynomials) N (number of trials) T(number threads)
With 32 OpenMP threads the DPCPP compiled program is approximately 3 times slower
than the one compiled with GCC.
dpcpp -O3 -fopenmp -Wall -funroll-loops -ffast-math monte_carlo_integration.cpp -o MC_DPCPP.exe
time ./MC_DPCPP.exe 1000 10000000 32 > /dev/null
real 1m17.954s
user 36m22.515s
sys 0m0.401s
g++ -O3 -fopenmp -funroll-loops -ffast-math -fprofile-use monte_carlo_integration.cpp -o M1.exe
GCC 11.1
./M1_GCC111.exe 1000 10000000 32 > /dev/null
real 0m23.694s
user 11m8.420s
sys 0m0.019s
GCC 8.3.1
time ./MC_POLY 1000 10000000 32 > /dev/null
real 0m26.024s
user 12m21.249s
sys 0m0.020s
Running perf stat on the two binaries gives
GCC 8.3.1
Performance counter stats for './MC_POLY 10 10000000 1':
5,619.33 msec task-clock:u # 1.000 CPUs utilized
0 context-switches:u # 0.000 K/sec
0 cpu-migrations:u # 0.000 K/sec
167 page-faults:u # 0.030 K/sec
17,887,915,840 cycles:u # 3.183 GHz
30,678,358,310 instructions:u # 1.72 insn per cycle
4,101,348,797 branches:u # 729.864 M/sec
226,816,326 branch-misses:u # 5.53% of all branches
5.620014706 seconds time elapsed
5.609363000 seconds user
0.001990000 seconds sys
Performance counter stats for './MC_DPCPP.exe 10 10000000 1':
15,906.43 msec task-clock:u # 1.000 CPUs utilized
0 context-switches:u # 0.000 K/sec
0 cpu-migrations:u # 0.000 K/sec
651 page-faults:u # 0.041 K/sec
49,488,407,800 cycles:u # 3.111 GHz
82,102,796,894 instructions:u # 1.66 insn per cycle
6,603,124,397 branches:u # 415.123 M/sec
3,099,931 branch-misses:u # 0.05% of all branches
15.911192960 seconds time elapsed
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Sandeep,
Thanks for reaching out to us.
Could you please provide us the details of DPC++ compiler version on which you are working?
Meanwhile we will look into this issue internally. we will get back to you soon.
Regards,
Vidya.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks Vidya
Intel(R) oneAPI DPC++ Compiler 2021.2.0 (2021.2.0.20210317)
Target: x86_64-unknown-linux-gnu
This is the version I am using.
Regards,
Sandeep
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Sandeep,
I've reported this problem to our Developer.
Thanks,
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi
Is there any update to this issue ?
Even single threaded performance is much (3x) slower than gcc. Is this due to LLVM not able to optimize lambda[] functions ?
Sandeep
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sorry, we don't have any update yet on this issue.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
This issue has been addressed. The next update will show icpx is much faster -fiopenmp.
Thanks,
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Please upgrade to oneAPI2022.3 which addressed this issue.
I am going to close this thread.
Thanks,
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page