Hello Michael,

Tse__Amon · ‎04-12-2019

I am experimenting opencl with examples from the book OpenCL in Action. I found different behaviors on different devices. Below is the kernel code about matrix multiplication I am in trouble with.

__kernel void matrix_mult(__global float4 *a_mat, 
      __global float4 *b_mat, __global float *c_mat) {

   float sum;

   int num_rows = get_global_size(0);
   int vectors_per_row = num_rows/4;
   int start = get_global_id(0) * vectors_per_row;
   a_mat += start;
   c_mat += start*4;

   for(int i=0; i<num_rows; i++) {
      sum = 0.0f;
      for(int j=0; j<vectors_per_row; j++) {
         sum += dot(a_mat, b_mat[i * vectors_per_row + j]);
      }
      c_mat = sum;
   }
}

I am testing with Intel Opencl SDK 2019 on Intel CPU i7-4600U and Intel GPU HD Graphics 4400. Both devices (CPU and GPU) can complete the kernel successfully on matrix size 1024x1024 floats (the kernel is executed with global size set to 1024). However, if I increase the matrix size to 2048x2048 (the kernel is then executed with global size set to 2048), kernel execution can still be completed using CPU. However kernel execution hangs on GPU without return.

The issue seems devices specific. If I commented out the line inside the for loop (i.e. the line with sum+= dot…), then Intel GPU can complete the kernel execution.

I wonder the issue may be related to the conflict of global memory access of a_mat and b_mat across different processing elements.

May any experts offer me any advice to figure out a solution?

Michael_C_Intel1 · ‎04-16-2019

Hi AmonT,

Thanks for sharing the experience of your development efforts.

I recommend checking the constraints reported from clGetDeviceInfo(...)... there are a handful of bitflags that correspond to maximum values the OpenCL device is capable of. Check the values for CL_DEVICE_MAX_* to see if the enqueued kernels and memory objects are exceeding any limits:

Reference

Then there is also the kernel params from clGetKernelWorkgroupInfo(...)... Again check the values to see if the kernels themselves are constrained as necessary.

Reference

If you could post a reproducer to this thread I'd be happy to give it a try... the setup and tear down on the host is needed as well... Please ensure posted reproducers do not contain any privileged code. If there is an issue... we'd like to make sure the runtime shows a configuration issue or platform constraint more gracefully/clearly.

Are you on Linux OS or Windows OS? In either case you may wish to check for an updated Intel® Graphics driver and OpenCL runtime.

Windows - Check the system vendor website first before going to downloadcenter.intel.com. Vendors may have support or warranty criteria.
Linux - Try beignet. Broadwell and newer systems want to see linux 4.11 with the NEO runtime...

I don't think any of the first party runtime builds for Haswell based processors are actively supported by Intel anymore. This excludes anything prior to NEO runtume which wants broadwell or newer.

A dump from a clinfo type application could be useful... Note that the SDK has something like clinfo embedded with the IDE (either Eclipse or Visual studio)... Look for the Code Builder Platform Info Tree. But again, these plugins aren't supported on Haswell for Intel® Graphics. Please consider reviewing the support matrix from the SDK tools release notes.

-MichaelC

Tse__Amon · ‎04-16-2019

Hello Michael,

Thank you for your information. I am not aware that the GPU hardware I am testing is no longer supported by Intel Studio 2019. I tested the same piece of code in AMD Vega12 GPU and it is working. I believe my issue is likely hardware compatibility issue. My case can be closed.

- Amon

Michael_C_Intel1 · ‎04-16-2019

Hi AmonT,

Thanks for the interest and the response.

Aligning the program with constraints interrogated from that Haswell Graphics runtime may produce useful parameters.... Those parameters may allow the program to function if scheduling is used. OpenCL programs use dynamically interrogated parameters to maintain maximum portability and future proofing. In this case, since Haswell systems are popular it's still worthwhile for us to observe the issue if you can submit attachments... If it's arduous don't worry about it... I'd like to give some more context for anonymous forum viewers:

Clarifications:

For many users and developers it's unclear... Please note there is a difference between

SDK support to develop OpenCL programs and
runtime support as it executes an OpenCL program.

SDK OCL Debuggers and plugins are not currently supported for older Haswell graphics hardware. SDK OCL tools are supported for Haswell CPU. Functionality on Haswell graphics for SDK debuggers and plugins is unknown. See release notes to with respect to Haswell and older platforms (CPU is supported).

OpenCL runtimes are not currently supported by Intel on Haswell graphics hardware. Keep in mind OpenCL runtimes exist and were supported at one time. It is expected that on a Haswell + Intel Graphics capable system, graphics drivers including OpenCL runtimes will be deployed as a part of Windows 10 stock that are out of date. There may be even be some applicable, albeit unsupported, updates to OpenCL included that improve functionality. See release notes either from 1) system vendor graphics drivers package... 2) downloadcenter.intel.com graphics driver packages and from beignet on Linux systems... Beignet may have some useful guidance in particular.

Please see documentation here:

SDK Release Notes
CPU RT Release notes
Intel Graphics Compute Runtime Release Notes (Broadwell and newer)

-MichaelC

Matrix mulitplication - Hanged with Intel GPU