OpenCL* for CPU
Ask questions and share information on Intel® SDK for OpenCL™ Applications and OpenCL™ implementations for Intel® CPU
Announcements
This forum covers OpenCL* for CPU only. OpenCL* for GPU questions can be asked in the GPU Compute Software forum. Intel® FPGA SDK for OpenCL™ questions can be ask in the FPGA Intel® High Level Design forum.

problem with profiling

Jonny_G_
Beginner
169 Views

hello !

I have an issue with an OpenCL application that compute matrix multiplication.

In particular i think that the problem is related to "clGetEventProfilingInfo" function. If i execute the program by using the CPU (Intel Core i5-4300U) all works fine and by using "clGetEventProfiling" function it calculates the execution time correctly.

Instead, if i use GPU (Intel HD4400), all works fine if i don't use the "clGetEventProfilingInfo".When i use "clGetEventProfilingInfo" to calculate the execution time and set a local work size in "clEnqueueNDRangeKernel" the program crashes and i don't understand why (instead, if i use "NULL" for local work size parameter in "clEnqueueNDRangeKernel" all seems to work) .Using Visual Studio debugger i think it's a "access violation" problem but i'm not shure.

This is the code of application : https://www.friendpaste.com/2NIpYvk8R96S01kFD3H3Gl

Can someone help me?

0 Kudos
7 Replies
Robert_I_Intel
Employee
169 Views

Hi Jonny,

It appears that you are trying to set localThreads variable to {512, 512} - this is way too big for a workgroup on a GPU. The size of the workgroup shouldn't exceed 512 elements on your processor and 256 on 5th gen processors and beyond, so the good number to try are {8, 8}, {16, 8} or {32, 8}, etc.

If that does not solve the issue, please let me know what graphics driver you are using.

Jonny_G_
Beginner
169 Views

Yes, it solved the issue . Thank you.

I have an other question : when i try to use Intel Code Analyzer on my OpenCL application i have some errors. 

When i lunch the code analysis for kernel side profiling (occupancy) it seems to work but in "Trick per Threads" and "Execution Units" sections i have this message :" Error:unable to retrieve report's data". 

When i lunch the code analysis for kernel side profiling (latency) it doesn't work at all. In "Application Output" section i have this message :"skip source annotation because the source-profile file is empty".

I'm using a Microsoft Surface Pro 3 with Visual Studio 2013 with the latest  Intel Code-Builder. I'm not sure about latest driver for Intel HD4400. If i download the latest driver i can't install it because Surface Pro 3 requires a customize driver that i don't know where to find.

It could be a drivers problem?

Robert_I_Intel
Employee
169 Views

Hi Jonny,

Which driver version do you have? Could you provide the kernel/code you are trying to analyze? Did you install the latest Code Builder patch https://software.intel.com/en-us/forums/opencl/topic/591196 ?

Thanks!

 

Jonny_G_
Beginner
169 Views

I've already installed the latest Code Builder patch from that link.

Driver version I have for Intel HD4400 is 10.18.15.4256.

I downloaded the latest version of driver (Win 10 version) from here : https://downloadcenter.intel.com/download/25308/Intel-Iris-Iris-Pro-and-HD-Graphics-Driver-for-4th-Gen-Windows-10-64bit  , but when i try to install it on my Sufrace Pro 3 i have an error that tells me that i need a customized driver for my device and i can't go on with the installation.

Code that i'm trying to analyze is here : https://www.friendpaste.com/2NIpYvk8R96S01kFD3H3Gl ,obviously with valid values in localThreads array.

Robert_I_Intel
Employee
169 Views

I am able to reproduce this issue. Will file a bug.

Jonny_G_
Beginner
169 Views

is a problem that could be solved ?

Robert_I_Intel
Employee
169 Views

Jonny,

Yes, the development team told me that they could solve this issue, so the solution should be available in the next release.

Reply