- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi all, I have a project that uses OpenCL for computation. Below behavior is quite strange to me, any help is appreciated!
I can't post my code in detail here, but the pseudo-code is:
// STEP 1: Uploading input from CPU to GPU (using clEnqueueWriteBuffer) // STEP 2: Running several kernels for computation // STEP 3: Do some CPU code (probably 100ms or more) // STEP 4: Uploading another input from CPU to GPU (using clEnqueueWriteBuffer)
The input size (in bytes) in STEP 1 is the same as that in STEP 4. It took ~0.5ms to transfer data in step 1, while ~10ms to transfer data in STEP 4. I also called sync (clFinish) before and after each step. Any ideas why this could happen? I suspect that Intel driver put my OpenCL context/queue to "idle-stage" and it needs a little time to "wake" things up.
P.s: the performance of step 1 and step 4 are the same in NVIDIA & AMD devices.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi HuyL,
Thanks for the post. This is a very useful discussion topic.
Proprietary code that developers don't have the permission to share isn't suitable for the forum. However, generic reproducers can be. Based on your pseudocode, it's possible a reproducer could show similar behavior. Could you prepare one and attach it?
Separately, take advantage of mapping pointers on Intel® Graphics Technology, as graphics domain kernels can be exposed to the same address space. This has been observed to provide significant speed up over clEnqueueWriteBuffer(...) calls. Link: https://software.intel.com/en-us/node/540453.
Also, forcing synchronous behavior comes with a performance penalty. Consider refactoring the code to be asynchronous where it can be.
These two things above are some of the most common performance considerations with OpenCL programming. You may also want to check Intel® Vtune™ Amplifier GPU hotspots mode. You should get some good API call feedback over the life of the program.
The opensource clIntercept Layer project up on github could also help you: https://github.com/intel/opencl-intercept-layer
-MichaelC

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page