OpenCL* for CPU
Ask questions and share information on Intel® SDK for OpenCL™ Applications and OpenCL™ implementations for Intel® CPU.
Announcements
This forum covers OpenCL* for CPU only. OpenCL* for GPU questions can be asked in the GPU Compute Software forum. Intel® FPGA SDK for OpenCL™ questions can be ask in the FPGA Intel® High Level Design forum.

OpenCL source file does not compile with Intel OpenCL SDK

Edgardo_Doerner
2,692 Views

Hi to everyone,

I am facing (another) strange problem with one of my codes. I have been working since some time on a MC code for particle transport using OpenCL. In the last weeks we are facing a strange problem with one of the source files, and specifically, with a function inside it.

The problem is that the compilation process hangs when an Intel SDK is used. We have tested on several systems and always when the Intel platform is targeted the compilation takes forever and does not finish (I leave the PC during the weekend and the compilation did not finished). More specifically, this happens when the Intel CPU is targeted (strangely with the GPU compiles and then the program executes without issues)

Doing some testing we realized that the problem is with the following function inside the *.cl source file:

void howfar(
            // Particle information
            particle_t *p,
            
            // Geometry data.
            __global int3 *ngrid,
            __global float *xbounds,
            __global float *ybounds,
            __global float *zbounds,
            
            // Output information
            int *idisc,
            int *irnew,
            float *ustep) {
    
    float dist = 0.0f;  // distance to boundary along particle trajectory
    
    if (p->ir == 0) {   // the particle is outside the geometry
        *idisc = 1; // terminate history
        return;
    }
    else {  // in the geometry, do transport checks
        int ijmax = ngrid[0].x * ngrid[0].y;
        int imax = ngrid[0].x;
        
        /* First we need to decode the region number of the particle in terms
         of the region indices in each direction. */
        int irx = (p->ir - 1) % imax;
        int irz = (p->ir - 1 - irx) / ijmax;
        int iry = ((p->ir - 1 - irx) - irz * ijmax) / imax;
        
        /* Check in z-direction. */
        if (p->u.z > 0.0f) { // going towards outer plane
            dist = (zbounds[irz + 1] -    p->r.z) / p->u.z;
            if (dist < *ustep) {
                *ustep = dist;
                if (irz != (ngrid[0].z - 1)) {
                    *irnew = p->ir + ijmax;
                }
                else {
                    *irnew = 0; // leaving geometry
                }
            }
        }
        
        else if (p->u.z < 0.0f) { // going towards inner plane
            dist = -(p->r.z - zbounds[irz]) / p->u.z;
            if (dist < *ustep) {
                *ustep = dist;
                if (irz != 0) {
                    *irnew = p->ir - ijmax;
                }
                else {
                    *irnew = 0; // leaving geometry
                }
            }
        }
        
        /* Check in x-direction. */
        if (p->u.x > 0.0f) { // going towards positive plane
            dist = (xbounds[irx + 1] - p->r.x) / p->u.x;
            if (dist < *ustep) {
                *ustep = dist;
                if (irx != (ngrid[0].x - 1)) {
                    *irnew = p->ir + 1;
                }
                else {
                    *irnew = 0; // leaving geometry
                }
            }
        }
        
        else if (p->u.x < 0.0f) { // going towards negative plane
            dist = -(p->r.x - xbounds[irx]) / p->u.x;
            if (dist < *ustep) {
                *ustep = dist;
                if (irx != 0) {
                    *irnew = p->ir - 1;
                }
                else {
                    *irnew = 0; // leaving geometry
                }
            }
        }
        
        /* Check in y-direction. */
        if (p->u.y > 0.0f) { // going towards positive plane
            dist = (ybounds[iry + 1] - p->r.y) / p->u.y;
            if (dist < *ustep) {
                *ustep = dist;
                if (iry != (ngrid[0].y - 1)) {
                    *irnew = p->ir + imax;
                }
                else {
                    *irnew = 0; // leaving geometry
                }
            }
        }
        
        else if (p->u.y < 0.0f) { // going towards negative plane
            dist = -(p->r.y - ybounds[iry]) / p->u.y;
            if (dist < *ustep) {
                *ustep = dist;
                if (iry != 0) {
                    *irnew = p->ir - imax;
                }
                else {
                    *irnew = 0; // leaving geometry
                }
            }
        }
        
    }
    
    return;
    
}

For testing purposes, if I delete everything inside (empty function) and/or remove the call to the function in the kernel the compilation process finishes without problems. Aditionally, if I target NVIDIA or AMD platforms the code compiles and executes without issues, and even in macOS using the Apple OpenCL framework (with Intel CPU and GPU) the code also compiles and executes without problems. I attached a sample code that can be executed and/or compiled using CodeBuilder.

Unfortunally I have no clue of what is going on. The function is not the most complex that I have seen and really I am not able to see the problem, and I have no output during the compilation process that could give me a clue of what is happening. Thanks for your help!

0 Kudos
1 Solution
Michael_C_Intel1
Moderator
2,692 Views

EdgardoD,

Thanks for the feedback

There were some critical compatibility fixes for the 2019 Update 4 versions of the tools particularly for working with the latest driver packages for Windows* OS. Hoping you and your project are successful.

Take care,

-MichaelC

 

View solution in original post

0 Kudos
13 Replies
Edgardo_Doerner
2,692 Views

Here are the files ...

0 Kudos
Michael_C_Intel1
Moderator
2,692 Views

Hi EdgardoD,

Thanks so much for the in depth write up and sharing your code. I'll get a fresh deployment and attempt a build reproduction.

  • Can you tell me about your platform you're attempting to build on?
    • CPU SKU?
    • OS? I see a bunch of ^M but I still don't want to make assumptions.
  • Which OpenCL implementations were resident on the development system employing CodeBuilder?
    • See /etc/OpenCL/vendors/ or registry contents in Computer\HKEY_LOCAL_MACHINE\SOFTWARE\Khronos\OpenCL\Vendors
  • Can you use ioc64 from the command line to build the kernel? Try ./ioc64 -help. Default places to find ioc64:
    • C:\Program Files (x86)\IntelSWTools\OpenCL\sdk\bin\x64
    • /opt/intel/opencl/SDK/bin/ioc64

Note: ioc64 from 2017 SDK hangs if attempting to build for Intel® Graphics Technology through Intel® Graphics Compute Runtime for OpenCL™ Driver for Linux* OS (NEO). Users can try the 2019 Intel® SDK for OpenCL™ Beta as found in Intel® System Studio XE 2019 Beta for NEO functionality.

 

Maybe helpful docs:

I recently refactored the SDK gsg pages here. They are 2017 SDK oriented:

 

The deployment page may also have some helpful guidance:

 

-MichaelC

 

 

0 Kudos
Edgardo_Doerner
2,692 Views

Hi Michael,

First the data:

  • Procesor SKU : i7-4820K
  • OS : Windows 10 Pro
  • In the registry I see register from two vendors: NVIDIA and Intel, the files are:
    • C:\Windows\System32\nvopencl.dll
    • intelopencl64.dll
    • intelopencl64_2_1.dll
    • IntelOpenCLProfiler.dll
  • I am able to call ioc64 from command line, however I have not used it yet to compile the source file. Is there any flags in order to obtain more information during compilation (i.e. verbose mode or something like that).

Well, it has been really frustrating this issue because it happens only with Intel CPUs (even with a iGPU works!). In our team we have reproduced the issue on several Intel CPUs under Windows. For example, on macOS the program compiles and runs without problems in the Intel CPU, AMD GPU (in the case of an iMac) or Intel iGPU in the case of a macbook air.

I have done several tests in my PC (with the i7-4820K CPU) and I found that if remove the following lines regarding writing data to global memory the kernel does compiles (from line 565 in the *.cl source file):

            if (p.e <= g_pcut[p.ir]) {
                /****************************************
                 * Photon cutoff energy discard section
                 ****************************************/
                /* Save particle phase space data to stack. */
                gstack_ir[pid] = p.ir;      // region number
                gstack_iq[pid] = p.iq;      // particle charge
                gstack_r[pid] = p.r;        // position
                gstack_u[pid] = p.u;        // direction
                gstack_e[pid] = p.e;        // photon energy
                gstack_stat[gid].y = PCUT;
                return;
                /*********************************************
                 * End of Photon cutoff energy discard section
                 *********************************************/
            }

I attached the sample with such lines commented. These lines are inside two nested while loops, so I do not if the system is not able to understand/optimize such operation. Thanks for your help.

0 Kudos
Michael_C_Intel1
Moderator
2,692 Views

Hello Edgardo,

 

I ran your kernel through the Intel® Code Builder Platform on Windws* 10 OS i7-6770HQ (Skylake).

I was able to build your original ocl_mc.cl kernel through Code Builder and the offline ioc64 tool without src changes:

  • Intel® Graphics Technology device worked
  • Intel® CPU worked.

 

Log for CPU:

>"C:\Program Files (x86)\IntelSWTools\OpenCL\sdk\bin\x64\ioc64.exe" -device=cpu -input=ocl_mc.cl -ir=ocl_mc.ir No command specified, using 'build' as default Using build options: -I "removed" Setting target instruction set architecture to: Default (Advanced Vector Extension 2 (AVX2)) OpenCL Intel CPU device was found! Device name: Intel(R) Core(TM) i7-6770HQ CPU @ 2.60GHz Device version: OpenCL 2.1 (Build 716) Device vendor: Intel(R) Corporation Device profile: FULL_PROFILE Compilation started Compilation done Linking started Linking done Device build started Device build done Kernel <counthits> was successfully vectorized (8) Kernel <photon_step> was not vectorized Done. counthits info: Maximum work-group size: 8192 Compiler work-group size: (0, 0, 0) Local memory size: 0 Preferred multiple of work-group size: 128 Minimum amount of private memory: 0 photon_step info: Maximum work-group size: 8192 Compiler work-group size: (0, 0, 0) Local memory size: 0 Preferred multiple of work-group size: 128 Minimum amount of private memory: 0 Build succeeded!

 

On this system the 24.20.100.6286 Intel® Graphics Technology driver is being used.

 

User reports for these kinds of issues are greatly appreciated. Ill request devs to see if they can reproduce your error on an i7-4xxx SKU which uses a different older development implementation branch for the OpenCL™ runtime. Users experiencing similar looking issues should ensure that their Intel® Graphics Technology driver is up to date. Both the direct packages from downloadcenter.intel.com or the vendor supported repackages should contain OpenCL™ CPU and Graphics Technology implementations for a particular processor. We pump out Windows* OS driver package builds fairly regularly for 6th generation Intel Core processor and later processor SKUs. I'm not sure at the moment about the release cadence for older SKUs.

 

If developers do not use a system with Intel® Graphics Technology, they should look to the Intel® CPU Runtime for OpenCL™ applications on the OpenCL™ deployment page. If the goal is CPU centric, it's possible to uninstall the Intel® Graphics Technology driver and install Intel® CPU Runtime for OpenCL™ Applications as a standalone. This may provide an updated CPU runtime that could resolve the issue observed.

 

For more information see the deployment page: https://software.intel.com/en-us/articles/opencl-drivers

 

For more info on CPU runtime specifically:

Anchor that could go away in an html refactor: https://software.intel.com/en-us/articles/opencl-drivers#cpu-win64

Edit: Updated link https://software.intel.com/en-us/articles/opencl-drivers#cpu-section

Please be sure to see the system requirements in the release notes: https://software.intel.com/en-us/articles/opencl-runtime-release-notes/

 

Edit: Does the CPU implementation triage the issue you observed on your development systems? Is it viable for your goal?

 

Thanks,

 

-MichaelC

0 Kudos
Michael_C_Intel1
Moderator
2,692 Views

Internal case number is 3375.

-MichaelC

0 Kudos
Edgardo_Doerner
2,692 Views

Hi Michael,

exactly, is the CPU that triggers the issue. The strange thing is that old implementations of my code (with different layout of the particle stack on global memory) compile and works without problems. It it unclear for me why that part of the code with these readings to global memory triggers such an issue during compilation. Well, I hope that you will be able to reproduce the problem...

0 Kudos
Edgardo_Doerner
2,692 Views

Hi Michael, it has passed a quite long time, but it has been any progress in this issue?

0 Kudos
Michael_C_Intel1
Moderator
2,692 Views

Hello EdgardoD,

I have not received feedback on this issue.

Can you move to the Intel® System Studio 2019: OpenCL™ Tools distribution of the SDK to get the latest ioc64 / GUI front end capability?

Also... can you confirm which graphics driver package you have deployed? I've supplied the devs with the assumption that you are on 15.40 branch. See downloadcenter.intel.com... click Graphics Drivers and find your SKU there. Useful feedback is limited against older package distributions. There was a new graphics driver package release for the 15.40 branch on 20180918... it may be worth trying on your system.
You could also be using a vendor supplied graphics driver package.

For reference: I've moved the test skylake client system mentioned earlier to 25.20.100.6373 since, the attached screen shot is the included CPU runtime 'device' information.

Does the standalone CPU implementation, as opposed to the graphics driver included implementation, triage the issue you observed on your development systems? Is it viable for your goal?

Here are the details: https://software.intel.com/en-us/articles/opencl-runtime-release-notes/
download: https://software.intel.com/en-us/articles/opencl-drivers#cpu-section

System requirements suggest the haswell SKUs are viable with this standalone implementation.

The standalone however, is not deployable with the implementations as found in the graphics driver deployment for Windows* OS. There are instructions on how to perform manual installation in the release notes.... or uninstall of the graphics driver package may be required.

Thanks,

-Michael

examplecpupane.png

0 Kudos
Michael_C_Intel1
Moderator
2,692 Views

Hi EdgardoD,

Revisting this thread from the fall.

If possible can you give us the specific driver version and branch in use that showed the error? If this issue is still occuring?

Were you able to try ioc64 like I reported in this thread? Did you try getting your own build feedback from the OpenCL API with CL_PROGRAM_BUILD_LOG ? If so, what kind of feedback or lack of feedback did it provide?

 

Thank you!

-MichaelC

 

 

0 Kudos
Michael_C_Intel1
Moderator
2,692 Views

Hi EdgardoD,

Just to follow up some more... We tested 5 different CPU SKUs with that 15.40 driver branch package and couldn't reproduce the issue with the source code. Hoping any updates may have resolved the issues.

For SKUs w/ Intel® Graphics Technology, recommendation is to move to the latest driver package from either the system vendor (per support agreement tied to vendor driver usage) OR obtain the graphics driver from downloadcenter.intel.com. Remember, the driver package contains both Intel CPU and GFX implementations for OpenCL.

For SKUs w/o Intel® Graphics Technology, recommendation is to get the latest from the CPU Runtime portal.

Thank you,

 

-MichaelC

0 Kudos
Edgardo_Doerner
2,692 Views

Hi Michael,

thanks for the update!, I have not worked on the code in the last months, so I will setup a new platform to try the last version of the OpenCL SDK. By the end of the week I will give you an update on the situation.

Thanks for your help!

0 Kudos
Edgardo_Doerner
2,692 Views

Ok, I tried the last versions of both the CPU runtime and OpenCL SDK and now the kernel builds and compiles without issues. I suppose then that in the last months "the issue" (whatever it was) got solved in the last updates to the OpenCL tools.

Thanks for your help and the follow up of this issue!

0 Kudos
Michael_C_Intel1
Moderator
2,693 Views

EdgardoD,

Thanks for the feedback

There were some critical compatibility fixes for the 2019 Update 4 versions of the tools particularly for working with the latest driver packages for Windows* OS. Hoping you and your project are successful.

Take care,

-MichaelC

 

0 Kudos
Reply