- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
We are running a simple code doing random reads and sequential write (i.e. gather operation) on both the CPU and GPU part of the I7-4770R (separately, one at a time) and experiencing 4x slower performance on the GPU compared to the CPU. When doing sequential reads and writes and even random writes, the performance is very similar indicating that both the internals of the chip as well as the memory controller allows the GPU to access the DRAM with the same speed the CPU does. However have no idea why random reads suffer a 4x performance penalty and this limits our application’s performance quite a lot. Would be good to know what the reason of this performance difference is and see whether there is some remedy for it.
Here are also the numbers from our experiments. The metric is execution time, so the lower the better.
|
MAP |
REDUCE |
GATHER |
SCATTER |
Intel i-4770r IrisPro-16G mem-4 Cores-OpenMP-CPU |
24.73 |
13.65 |
36.34 |
231.67 |
Intel i-4770r IrisPro-16G mem-40 EU-OpenCL-GPU |
23.55 |
16.29 |
167.03 |
270.7 |
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Norbert,
Would it be possible to provide your benchmarks to us? If you do not want to post it in a public forum, you could send it as a private message.
Thanks!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
For map:
- create an input and an output array of 32M integer elements each
- fill the input array with data.
- Walk the input array in sequence , assigning value of each input array element to output array element in squence
Int a
fill_data(a);
for(i=o; i<32*1024*1024; ++i)
b = a;
For gather:
- Create an input, and output and a index array of 32M elements each
- fill the input array with data
- fill the index array with random indices into the output array
- walk the index array in sequence, using the random index value to gather from input array for sequential assignment to output
int a
fill_data(a)
fill_random_index(index); // fills with random value between 0 and 32M-1
for(i=0; i<32*1024*1024; ++i)
{
idx = index;
b = a[idx];
}
For scatter:
- similar to gather, but walkt the index array in sequence and using the random index value to scatter to output array from sequentially read input
int a
fill_data(a)
fill_random_index(index); // fills with random value between 0 and 32M-1
for(i=0; i<32*1024*1024; ++i)
{
idx = index;
b[idx] = a;
}
On the GPU, it’s just single OpenCL kernels, on CPU, we use openmp with multiple cores.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Norbert,
Couple of things:
1. What about reduce?
2. If you provide the actual code, this would speed up things quite a bit.
Thanks!

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page