Measuring execution time of vector addition kernel using Opencl Event

Tamrakar__Yunik · ‎02-08-2018

Hello all,

I am a total beginner in Opencl/ GPU computing and I decided to start off by working on the Vector addition using Opencl (V 1.2). I was able to successfully run the code and then, decided to move further and write the code to compute the execution time of Vector addition on GPU using the Opencl Event. However, I ran into a segmentation fault of some sort and being a newbie to this stuff, despite trying for quite some time to fix this on my own, I'm not able to get over the error.

Furthermore, I'm still learning the various terms and methods involved in Opencl and thus, fixing the error has been difficult for me. I really want to move forward with my opencl coding, so any help in this regard would be much appreciated. I've attached screenshots of portions of the code for reference.

Thanks

Michal_M_Intel · ‎02-09-2018

Looks like event is not properly passed to clEnqueueNdRangeKernel. In the screen I see Null as a last parameter.

Please utilize the last parameter to pass pointer to cl_event you just created.

Ben_A_Intel · ‎02-09-2018

Hi Yunik, We recently released the Intercept Layer for OpenCL Applications, which is an open source debugging and performance analysis tool that you may be interested in. It hasn't been officially announced yet, but the source code is available here: https://github.com/intel/opencl-intercept-layer I mention this for three reasons: 1) You may find the CallLogging and ErrorLogging capabilities to be helpful when debugging problems like this one. With these controls enabled, the Intercept Layer will log all of the OpenCL calls your application is making and many of their parameters. Since your application is crashing, this should make it very easy to identify which call is crashing, and may provide some clues why it is crashing. (Note: You can likely get this information from your host application debugger also.) 2) The DevicePerformanceTiming capability will automatically add event profiling to your application. This gives you a way to measure the execution time of your kernels without any application modifications. 3) If you still want to add event profiling to your application directly, rather than use the DevicePerformanceTiming capability, you may find the Intercept Layer code useful to see how it uses event profiling. Give it a look and let us know what you think!

JWong19 · ‎03-06-2018

your line 137, 'ckKernel' was not initialized but it was used for 'clEnqueueNDRangeKernel'