OpenCL* for CPU
Ask questions and share information on Intel® SDK for OpenCL™ Applications and OpenCL™ implementations for Intel® CPU.
Announcements
This forum covers OpenCL* for CPU only. OpenCL* for GPU questions can be asked in the GPU Compute Software forum. Intel® FPGA SDK for OpenCL™ questions can be ask in the FPGA Intel® High Level Design forum.
1718 Discussions

Hang on Linux when initializing OpenCL context from TBB thread

e4lam
Beginner
1,799 Views

Hi, I'm getting a hang/deadlock when when initializing an Intel OpenCL context from a TBB thread on Linux. On Windows, it's fine. Please try the attached source code.

 

// ocl_test.cpp
//
// Compile with: g++ -std=c++14 -I/path/to/opencl/include -I/path/to/tbb/include -lOpenCL -ltbb -lpthread -L. ocl_test.cpp -o ocl_test
//
// cl.hpp used from: https://www.khronos.org/registry/OpenCL/api/2.1/cl.hpp
//
// Run with:
// $ cd /path/to/parent/dir/libintelocl.so
// $ env LD_LIBRARY_PATH=. LD_PRELOAD=libintelocl.so ./ocl_test
//
// This hangs on Linux under various Intel processors but not Windows
//
#include <iostream>
#include <future>
#include <atomic>
#include <vector>

#define __CL_ENABLE_EXCEPTIONS
#define CL_TARGET_OPENCL_VERSION 120
#include "cl.hpp"

#ifdef _WIN32
#ifdef _DEBUG
    #pragma comment(lib, "OpenCL_d")
    #pragma comment(lib, "tbb_debug")
#else
    #pragma comment(lib, "OpenCL")
    #pragma comment(lib, "tbb")
#endif
#endif

//#define TEST_STDTHREAD
#define TEST_TBB

#ifdef TEST_TBB
#include <tbb/tbb.h>

template <typename F>
class LambdaTask : public tbb::task
{
public:
    LambdaTask(const F& f) : myF(f) {
    }
private:
    tbb::task* execute() override {
        if (myF()) {
            return new(tbb::task::allocate_continuation()) LambdaTask(myF);
        }
        return nullptr;
    }
    F myF;
};

template <typename F>
static void EnqueueLambda(const F& f) {
    tbb::task::enqueue(
            *new(tbb::task::allocate_root()) LambdaTask<F>(f) );
}

template <typename F>
static void SpawnAndWaitLambda(const F& f) {
    tbb::task::spawn_root_and_wait(
            *new(tbb::task::allocate_root()) LambdaTask<F>(f) );
}
#endif

int main() {

    std::vector<cl::Platform> platforms;
    cl::Platform::get(&platforms);

    if (platforms.empty()) {
        std::cout << "No platforms found.\n";
        return 1;
    }

    for (const cl::Platform &platform : platforms) {
        std::cout << "Available platform: "
                  << platform.getInfo<CL_PLATFORM_NAME>() << "\n";
    }

    cl::Platform platform = platforms[0];
    std::cout << "Using platform: " << platform.getInfo<CL_PLATFORM_NAME>()
              << "\n";

    std::vector<cl::Device> devices;
    platform.getDevices(CL_DEVICE_TYPE_ALL, &devices);
    if (devices.empty()) {
        std::cout << "No devices found.\n";
        return 1;
    }

    cl::Device device = devices[0];
    std::cout << "Using device: " << device.getInfo<CL_DEVICE_NAME>() << "\n";

    auto create_context = [&]() {
        std::cout << "Attempting to create context" << std::endl;
        cl::Context context({device});
        std::cout << "Created context" << std::endl;
    };

#if defined(TEST_STDTHREAD)

    std::cout << "TEST_STDTHREAD" << std::endl;
    auto task = std::async(std::launch::async, [&]() { create_context(); });
    task.wait();
    std::cout << "Done" << std::endl;

#elif defined(TEST_TBB)

    std::cout << "TEST_TBB" << std::endl;

    std::atomic_bool done(false);

    EnqueueLambda([&]() {
        create_context();
        done = true;
        return false;
    });

#if 1
    // Try waiting inside scheduler loop
    SpawnAndWaitLambda([&]() {
        if (done)
        {
            std::cout << "Done" << std::endl;
            return false;
        }
        return true;
    });
#else
    // This also fails if we just spin loop
    while (!done)
        ;
    std::cout << "Done" << std::endl;
#endif

#else
    std::cout << "TEST_SERIAL" << std::endl;
    create_context();
    std::cout << "Done" << std::endl;
#endif

    return 0;
}

 

0 Kudos
10 Replies
Michael_C_Intel1
Moderator
1,808 Views

Hi e4lam,

 

Thanks for the reproducer... Can you confirm your platform OS distro and CPU SKU? Which OpenCL (libintelocl.so) runtime are you using? From where did you acquire it for Windows and Linux respectively?

 

-MichaelC

0 Kudos
Michael_C_Intel1
Moderator
1,808 Views

Any particular reason you LD_PRELOAD instead of rely on the ICD Loader library?

 

Thanks,

-MichaelC

0 Kudos
e4lam
Beginner
1,808 Views

I'll have to check again when I get back to work on Monday but it's quite reproducible for across several CPUs on Linux (Mint, Ubuntu, RHEL). From the stack traces, it looks like a TBB problem interaction problem and has nothing to do with CPU because we haven't got that far yet. In one instance it looked like a problem with an internal barrier that was messing up because we saw two OCL tbb tasks in the same thread (possibly due to task stealing).

The reason for the LD_PRELOAD is to ensure that the Intel OpenCL driver is being used instead of the GPU driver. The LD_LIBRARY_PATH is there to ensure that the same tbb library is being used to rule out bad interaction with a different TBB library. This is a minimal reproducer from us trying to rule out as many possibilities as possible.

We tested with the latest Intel OpenCL driver and Intel TBB 2019 headers.

 

0 Kudos
Michael_C_Intel1
Moderator
1,808 Views

e4lam,

Thanks for the detail on loading.

When you get the opportunity can you share the:

  • TBB version?
    • 2019 initial release or 2019 Update X or a specific version string would be useful for Windows and Linux...
  • From where did you acquire TBB? The performance primitives site? threadbuildingblocks.org/github? System package repository manager (which)?
    • Custom build or prebuilt TBB binaries?
  • The version string with clinfo from the platforms? On CPU RT version and distributions:

There are two deployments of CPU Runtime for Windows. One is a standalone available from the Intel Registration Center, the other comes with the graphics driver for Intel Graphics. Vendors release drivers at different cadences. Intel has first party drivers... they are typically deployed on systems like NUCs or in rare cases where the system vendors support it. Which did you get it from?

On Linux, there are two deployments as well. One is experimental for SYCL... The other is standalone available from Intel Registration Center.

Confirming the runtime stack will help expedite finding a resolution.

Thanks,

-MichaelC

 

0 Kudos
e4lam
Beginner
1,808 Views

For both Linux and Windows, it is the standalone Intel OpenCL driver. In all of our tests, the systems were using nVidia or AMD discrete video cards which is why we had to ensure that the CPU driver is running.

As for TBB, we use the opensource releases built ourselves from GitHub. However, I'll note again that in the setup, we've ensured that we're running against the same tbb shared libraries as libintelopencl.so albeit compiled using the headers from GitHub.

Are you saying that you've tried the sample and cannot reproduce?

0 Kudos
e4lam
Beginner
1,808 Views

Here's the clinfo output from one of our machines with the libintelocl.so preloaded

0 Kudos
Michael_C_Intel1
Moderator
1,808 Views

Hi e4lam,

Release notes reference page

 

To my recollection, the TBB libraries included with the CPU RT 18.1 may have custom symbols (at least for windows). They were built for CPU RT only and are not promoted for direct user application linkage. So, LD_LIBRARY_PATH=.  for the user application worries me for TBB access.

One thing that they did mention in the release notes, is that if the application needs to use TBB directly it should use a higher version of TBB... you mentioned you are using 2019 headers... Can you deploy and link against the corresponding libs in linux? Per the guidance this means 2018 initial release or newer headers and libs are needed... as opposed to just headers.

See the 'Known Issues' (section 6 page 8) of the release notes.

Do you get a similar error when you link against TBB 2018 initial release or newer?

 

Sidebar:

Even though the release notes suggest 'functionality and performance' may vary, the sighting is still appreciated. Users will see cases where products that should seemingly run without issues. It would be useful to pass those cases along to TBB and CPU RT teams. Both TBB and CPU RT teams have been responsive to TBB w/ CPU RT issues in the past.

-MichaelC

0 Kudos
e4lam
Beginner
1,808 Views

This minimal reproducer was our attempt to trim things down from our production scenario. For our production scenarios, we have tried older configurations of Intel OpenCL and TBB where the application links against a separate TBB. We'll try again but we haven't found any combination that works so far on Linux.

What is the best channel to contact the teams that you mention? I was hoping that this was already a communication channel for the CPU RT team here.

 

Thanks!

 

 

 

0 Kudos
e4lam
Beginner
1,808 Views

PS. A careful reading of the known limitations that it's *supposed* to work whenever you use a newer OR *same* version of TBB, which is what the minimal example above does.

Linux* OS case: Make  sure  there  is  no  other  Threading  Building  Blocks  (TBB)  library  in  your OpenCL™  host  application  library  search  path  on  Linux*  OS.  Intel®  CPU  Runtime for  OpenCL™  Applications  was  tested  only  with  Intel®  Threading  Building  Blocks (Intel®  TBB)  libraries  included  in  the  package. In  case  OpenCL™  host  application  intentionally  uses  features  of  a  standalone Threading  Building  Blocks  (TBB)  make  sure  that  it  is  of  a  higher  version  than  the library  version  in  the  package  and  is  found  earlier  in  the  shared  library  search procedure.  If  standalone  Threading  Building  Blocks  (TBB)  libraries  are  loaded functionality  and  performance  may  vary. 

0 Kudos
Michael_C_Intel1
Moderator
1,808 Views

Hi e4lam,

Your feedback is appreciated. I suggest this forum as the most appropriate at this time.

The reproducer applies to the second part of the release notes section cited. Specifically, in that the host applications intentionally uses features of a stand alone Threading Building Blocks.

If you can cite observed behavior building and linking your application against TBB 2018 initial release or later, I'd like to take that to the CPU RT dev team as necessary.

Some times the dev teams themselves monitor these forums.

Thanks,

-MichaelC

0 Kudos
Reply