OpenCL* for CPU
Ask questions and share information on Intel® SDK for OpenCL™ Applications and OpenCL™ implementations for Intel® CPU.
Announcements
This forum covers OpenCL* for CPU only. OpenCL* for GPU questions can be asked in the GPU Compute Software forum. Intel® FPGA SDK for OpenCL™ questions can be ask in the FPGA Intel® High Level Design forum.

Random memory read performance difference between GPU and CPU (I7-4770R)?

Norbert_Egi
Beginner
443 Views

We are running a simple code doing random reads and sequential write (i.e. gather operation) on both the CPU and GPU part of the I7-4770R (separately, one at a time) and experiencing 4x slower performance on the GPU compared to the CPU. When doing sequential reads and writes and even random writes, the performance is very similar indicating that both the internals of the chip as well as the memory controller allows the GPU to access the DRAM with the same speed the CPU does. However have no idea why random reads suffer a 4x performance penalty and this limits our application’s performance quite a lot. Would be good to know what the reason of this performance difference is and see whether there is some remedy for it.

Here are also the numbers from our experiments. The metric is execution time, so the lower the better.

 

MAP

REDUCE

GATHER

SCATTER

Intel i-4770r IrisPro-16G mem-4 Cores-OpenMP-CPU

24.73

13.65

36.34

231.67

Intel i-4770r IrisPro-16G mem-40 EU-OpenCL-GPU

23.55

16.29

167.03

270.7

0 Kudos
3 Replies
Robert_I_Intel
Employee
443 Views

Hi Norbert,

Would it be possible to provide your benchmarks to us? If you do not want to post it in a public forum, you could send it as a private message.

Thanks!

 

0 Kudos
Norbert_Egi
Beginner
443 Views
Hi Robert,
 
Thank you for the help. Please see the details here:
 

For map:

-          create an input  and an output array of 32M integer elements each

-          fill the input array with data.

-          Walk the input array in sequence , assigning value of each input array element to output array element in squence

 

Int a, b;

fill_data(a);

for(i=o; i<32*1024*1024; ++i)

  b = a;

 

For gather:

-          Create an input, and output and a index array of 32M elements each

-          fill the input array with data

-          fill the index array with random indices into the output array

-          walk the index array in sequence,  using the random index value to gather from input array for sequential assignment to output

 

int a, b, index;

fill_data(a)

fill_random_index(index);  // fills with random value between 0 and 32M-1

for(i=0; i<32*1024*1024; ++i)

{

   idx = index;

   b = a[idx];

}

 

For scatter:

-          similar to gather, but  walkt the index array in sequence and using the random index value to scatter to output array from sequentially read input

 

int a, b, index;

fill_data(a)

fill_random_index(index);  // fills with random value between 0 and 32M-1

for(i=0; i<32*1024*1024; ++i)

{

   idx = index;

   b[idx] = a;

}

 

On the GPU, it’s just single OpenCL kernels, on CPU, we use openmp with multiple cores.

Best regards,
Norbert
0 Kudos
Robert_I_Intel
Employee
443 Views

Norbert,

Couple of things:

1. What about reduce?

2. If you provide the actual code, this would speed up things quite a bit.

Thanks!

0 Kudos
Reply