- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
*** Performance of OpenCL devices: Intel CPU vs. Intel HD graphics vs. NVIDIA Quadro K1000M ***
Link Copied
8 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Test 1: Selected Platform Vendor : Intel(R) Corporation Device 0 : Intel(R) Core(TM) i7-3840QM CPU @ 2.80GHz Device ID is 000000000252CD20 ----------------------------------------- Copy 1D FastPath : 36.6145 GB/s ----------------------------------------- Copy 1D CompletePath : 29.6388 GB/s ----------------------------------------- Copy 2D 32-bit (64x2) : 31.2591 GB/s Copy 2D 128-bit (64x2) : 11.3323 GB/s ----------------------------------------- Copy 2D 32-bit (64x4) : 32.7530 GB/s Copy 2D 128-bit (64x4) : 11.3483 GB/s ----------------------------------------- Copy 2D 32-bit (8x8) : 15.4581 GB/s Copy 2D 128-bit (8x8) : 11.6436 GB/s ----------------------------------------- Copy 2D 32-bit (256x1) : 35.3117 GB/s Copy 2D 128-bit (256x1) : 11.4634 GB/s ----------------------------------------- Copy 2D 32-bit (32x2) : 26.1347 GB/s Copy 2D 128-bit (32x2) : 11.2260 GB/s ----------------------------------------- Copy 2D 32-bit (64x1) : 27.3779 GB/s Copy 2D 128-bit (64x1) : 11.4214 GB/s ----------------------------------------- Copy 2D 32-bit (16x16) : 30.6822 GB/s Copy 2D 128-bit (16x16) : 11.9726 GB/s ----------------------------------------- Copy 2D 32-bit (16x4) : 25.1613 GB/s Copy 2D 128-bit (16x4) : 10.8295 GB/s ----------------------------------------- Copy 2D 32-bit (1x64) : 4.50334 GB/s Copy 2D 128-bit (1x64) : 7.78883 GB/s ----------------------------------------- Copy 1D 128-bit : 11.5064 GB/s ----------------------------------------- NoCoal Copy 1D 32-bit : 14.9298 GB/s ----------------------------------------- Split Copy 1D 32-bit : 7.37306 GB/s ----------------------------------------- HasLocalBankConflicts 32-bit : 21.7152 GB/s ----------------------------------------- NoLocalBankConflicts 32-bit : 106.663 GB/s
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Test 2: Selected Platform Vendor : Intel(R) Corporation Device 0 : Intel(R) HD Graphics 4000 Device ID is 000000000038B4F0 ----------------------------------------- Copy 1D FastPath : 22.5925 GB/s ----------------------------------------- Copy 1D CompletePath : 23.3405 GB/s ----------------------------------------- Copy 2D 32-bit (64x2) : 23.3489 GB/s Copy 2D 128-bit (64x2) : 20.9181 GB/s ----------------------------------------- Copy 2D 32-bit (64x4) : 23.3286 GB/s Copy 2D 128-bit (64x4) : 20.4307 GB/s ----------------------------------------- Copy 2D 32-bit (8x8) : 23.0804 GB/s Copy 2D 128-bit (8x8) : 19.5235 GB/s ----------------------------------------- Copy 2D 32-bit (256x1) : 23.3284 GB/s Copy 2D 128-bit (256x1) : 21.3214 GB/s ----------------------------------------- Copy 2D 32-bit (32x2) : 23.3390 GB/s Copy 2D 128-bit (32x2) : 20.8492 GB/s ----------------------------------------- Copy 2D 32-bit (64x1) : 23.3374 GB/s Copy 2D 128-bit (64x1) : 21.1877 GB/s ----------------------------------------- Copy 2D 32-bit (16x16) : 23.2818 GB/s Copy 2D 128-bit (16x16) : 19.0221 GB/s ----------------------------------------- Copy 2D 32-bit (16x4) : 23.3341 GB/s Copy 2D 128-bit (16x4) : 20.1941 GB/s ----------------------------------------- Copy 2D 32-bit (1x64) : 1.94075 GB/s Copy 2D 128-bit (1x64) : 5.82620 GB/s ----------------------------------------- Copy 1D 128-bit : 21.1652 GB/s ----------------------------------------- NoCoal Copy 1D 32-bit : 23.3417 GB/s ----------------------------------------- Split Copy 1D 32-bit : 23.2451 GB/s ----------------------------------------- HasLocalBankConflicts 32-bit : 18.6644 GB/s ----------------------------------------- NoLocalBankConflicts 32-bit : 92.8352 GB/s
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Test 3: Selected Platform Vendor : NVIDIA Corporation Device 0 : Quadro K1000M Device ID is 0000000000239740 ----------------------------------------- Copy 1D FastPath : 17.4898 GB/s ----------------------------------------- Copy 1D CompletePath : 16.7297 GB/s ----------------------------------------- Copy 2D 32-bit (64x2) : 16.7445 GB/s Copy 2D 128-bit (64x2) : 26.2880 GB/s ----------------------------------------- Copy 2D 32-bit (64x4) : 16.4854 GB/s Copy 2D 128-bit (64x4) : 26.0927 GB/s ----------------------------------------- Copy 2D 32-bit (8x8) : 7.71916 GB/s Copy 2D 128-bit (8x8) : 22.6304 GB/s ----------------------------------------- Copy 2D 32-bit (256x1) : 16.5442 GB/s Copy 2D 128-bit (256x1) : 26.3473 GB/s ----------------------------------------- Copy 2D 32-bit (32x2) : 9.51040 GB/s Copy 2D 128-bit (32x2) : 23.4407 GB/s ----------------------------------------- Copy 2D 32-bit (64x1) : 9.55709 GB/s Copy 2D 128-bit (64x1) : 22.8820 GB/s ----------------------------------------- Copy 2D 32-bit (16x16) : 14.9983 GB/s Copy 2D 128-bit (16x16) : 26.0951 GB/s ----------------------------------------- Copy 2D 32-bit (16x4) : 8.73788 GB/s Copy 2D 128-bit (16x4) : 24.4525 GB/s ----------------------------------------- Copy 2D 32-bit (1x64) : 1.95529 GB/s Copy 2D 128-bit (1x64) : 7.63785 GB/s ----------------------------------------- Copy 1D 128-bit : 22.9636 GB/s ----------------------------------------- NoCoal Copy 1D 32-bit : 9.85914 GB/s ----------------------------------------- Split Copy 1D 32-bit : 8.72993 GB/s ----------------------------------------- HasLocalBankConflicts 32-bit : 8.80286 GB/s ----------------------------------------- NoLocalBankConflicts 32-bit : 84.4227 GB/s
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for this very interesting data!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Here is a set of performance reports using C++ AMP with different accelerators:
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
FFTAMP.exe -d 0 -t -s 20 -i 64 -q ************************************************ FFT ************************************************ Available Accelerators: Accelerator 0 : Intel(R) HD Graphics 4000 Accelerator 1 : NVIDIA Quadro K1000M Accelerator 2 : Software Adapter Accelerator 3 : Software Adapter Accelerator 4 : CPU accelerator Selected accelerator : Intel(R) HD Graphics 4000 Sampling: 64 (64 sampled) benchmark runs Run SP FFT 512-pt 20M complex numbers Using Array! SP FFT 512-pt 20M complex numbers finished!(Total time(sec): 2.164) Time Information | Data Transfer to Accelerator(sec) | Mean Execution Time (sec) | GFLOPS | Data Transfer to Host(sec) | |-----------------------------------|---------------------------|---------|----------------------------| | 0.0565276 | 0.0322409 | 29.2708 | 0.0436215 | DP FFT(double precision) skipped because the selected accelerator doesn't support double precision. Run SP IFFT 512-pt 20M complex numbers Using Array! SP IFFT 512-pt 20M complex numbers finished!(Total time(sec): 1.950) Time Information | Data Transfer to Accelerator(sec) | Mean Execution Time (sec) | GFLOPS | Data Transfer to Host(sec) | |-----------------------------------|---------------------------|---------|----------------------------| | 0.0603286 | 0.028834 | 32.7293 | 0.0438934 | DP IFFT(double precision) skipped because the selected accelerator doesn't support double precision.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
FFTAMP.exe -d 1 -t -s 20 -i 64 -q ************************************************ FFT ************************************************ Available Accelerators: Accelerator 0 : Intel(R) HD Graphics 4000 Accelerator 1 : NVIDIA Quadro K1000M Accelerator 2 : Software Adapter Accelerator 3 : Software Adapter Accelerator 4 : CPU accelerator Selected accelerator : NVIDIA Quadro K1000M Sampling: 64 (64 sampled) benchmark runs Run SP FFT 512-pt 20M complex numbers Using Array! SP FFT 512-pt 20M complex numbers finished!(Total time(sec): 2.225) Time Information | Data Transfer to Accelerator(sec) | Mean Execution Time (sec) | GFLOPS | Data Transfer to Host(sec) | |-----------------------------------|---------------------------|---------|----------------------------| | 0.0638754 | 0.0322368 | 29.2746 | 0.0981965 | DP FFT(double precision) skipped because the selected accelerator doesn't support double precision. Run SP IFFT 512-pt 20M complex numbers Using Array! SP IFFT 512-pt 20M complex numbers finished!(Total time(sec): 1.989) Time Information | Data Transfer to Accelerator(sec) | Mean Execution Time (sec) | GFLOPS | Data Transfer to Host(sec) | |-----------------------------------|---------------------------|---------|----------------------------| | 0.100286 | 0.0288346 | 32.7287 | 0.0435111 | DP IFFT(double precision) skipped because the selected accelerator doesn't support double precision.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
FFTAMP.exe -d 2 -t -s 20 -i 64 -q ************************************************ FFT ************************************************ Available Accelerators: Accelerator 0 : Intel(R) HD Graphics 4000 Accelerator 1 : NVIDIA Quadro K1000M Accelerator 2 : Software Adapter Accelerator 3 : Software Adapter Accelerator 4 : CPU accelerator Selected accelerator : Software Adapter Sampling: 64 (64 sampled) benchmark runs Run SP FFT 512-pt 20M complex numbers Using Array! SP FFT 512-pt 20M complex numbers finished!(Total time(sec): 10.974) Time Information | Data Transfer to Accelerator(sec) | Mean Execution Time (sec) | GFLOPS | Data Transfer to Host(sec) | |-----------------------------------|---------------------------|---------|----------------------------| | 0.101657 | 0.168924 | 5.58664 | 0.0609542 | DP FFT(double precision) skipped because the selected accelerator doesn't support double precision. Run SP IFFT 512-pt 20M complex numbers Using Array! SP IFFT 512-pt 20M complex numbers finished!(Total time(sec): 9.897) Time Information | Data Transfer to Accelerator(sec) | Mean Execution Time (sec) | GFLOPS | Data Transfer to Host(sec) | |-----------------------------------|---------------------------|---------|----------------------------| | 0.115021 | 0.151724 | 6.21997 | 0.0718945 | DP IFFT(double precision) skipped because the selected accelerator doesn't support double precision.

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page