- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
icc stream.c -o stream -O3 -xHost -qopenmp -DSTREAM_ARRAY_SIZE=33554432
on fpga_compile node (Intel(R) Xeon(R) Platinum 8153 CPU @ 2.00GHz)
KMP_AFFINITY=compact ./stream Function Best Rate MB/s Avg time Min time Max time Copy: 141917.2 0.003796 0.003783 0.003813 Scale: 139707.1 0.003849 0.003843 0.003855 Add: 153685.5 0.005251 0.005240 0.005262 Triad: 156861.5 0.005183 0.005134 0.005525
on gpu node (Intel(R) Xeon(R) E-2176G CPU @ 3.70GHz)
KMP_AFFINITY=compact ./stream Function Best Rate MB/s Avg time Min time Max time Copy: 30683.5 0.017546 0.017497 0.017601 Scale: 32258.0 0.016687 0.016643 0.016742 Add: 33558.9 0.024043 0.023997 0.024099 Triad: 33405.5 0.024130 0.024107 0.024161
OpenMP:
on fpga_compile node (Intel(R) Xeon(R) Platinum 8153 CPU @ 2.00GHz)
icpc -O3 -xHost main.cpp OMPStream.cpp -qopenmp -DIMPLEMENTATION_STRING=\"OpenMP\" -g -DOMP KMP_AFFINITY=compact ./a.out Function MBytes/sec Min (sec) Max Average Copy 141828.793 0.00379 0.01443 0.00401 Mul 121572.295 0.00442 0.01676 0.00464 Add 133730.659 0.00602 0.01396 0.00616 Triad 134717.507 0.00598 0.01672 0.00613 Dot 177794.491 0.00302 0.01040 0.00311
on gpu node (Intel(R) Xeon(R) E-2176G CPU @ 3.70GHz)
Function MBytes/sec Min (sec) Max Average Copy 30787.821 0.01744 0.02158 0.01753 Mul 21622.489 0.02483 0.02676 0.02494 Add 24157.226 0.03334 0.03822 0.03349 Triad 24196.118 0.03328 0.03367 0.03337 Dot 32628.879 0.01645 0.01776 0.01654
Using OneAPI/SYCL:
dpcpp main.cpp SYCLStream.cpp -lsycl -lOpenCL -DIMPLEMENTATION_STRING=\"SYCL\" -O3 -DSYCL -o sycl-stream
on fpga_compile (CPU device):
Function MBytes/sec Min (sec) Max Average Copy 47386.727 0.01133 0.02763 0.01309 Mul 45257.555 0.01186 0.04275 0.01352 Add 49772.544 0.01618 0.03343 0.01742 Triad 50365.736 0.01599 0.02322 0.01720 Dot 8051.967 0.06668 6.20847 0.15319
on gpu node (CPU device):
./sycl-stream --device 0 Function MBytes/sec Min (sec) Max Average Copy 21547.783 0.02492 0.02766 0.02505 Mul 21498.313 0.02497 0.02621 0.02505 Add 24177.595 0.03331 0.03412 0.03345 Triad 24126.740 0.03338 0.03416 0.03354 Dot 31036.779 0.01730 0.02748 0.01909
on gpu node (Intel UDH GPU):
./sycl-stream --device 1 Function MBytes/sec Min (sec) Max Average Copy 36767.394 0.01460 0.01514 0.01492 Mul 36019.351 0.01491 0.01537 0.01504 Add 34204.743 0.02354 0.02428 0.02365 Triad 34836.742 0.02312 0.02378 0.02322 Dot 28777.236 0.01866 0.01948 0.01903
Any suggestions on oneAPI compile flags are welcome!
Questions:
- Is there any way to control thread affinity to address the likely NUMA issues I am seeing with the SYCL test's bandwidth on the dual-socket fpga_compile machine?
- How come on the gpu node, the bandwidth from the GPU is much better than from the CPU? I believe they share the same memory.
- Tags:
- General Support
Link Copied
3 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Istvan,
Thanks for the detailed question, we'll take a look and get back to you as soon as possible.
Regards,
William
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
What does fpga_compile node mean?
We are working on a feature to add affinity to DPCPP.
Thanks,
Varsha
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
fpga_compile node is the node I get when submitting a job with:
qsub -q batch@v-qsvr-nda -l nodes=1:fpga_compile:ppn=2

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page