Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.

Deadlock with Intel OpenCL driver on Linux

e4lam
Beginner
296 Views

Hi,

I would like to develop my application using Intel TBB and Intel's OpenCL driver on Linux. However, when I try to do so, I get a deadlock with the following relevant stack trace below. Note that on stack frame #3, the OpenCL driver calls into MyApp's libtbb.so.2 which was compiled usingTBB 4.0 Update 1. If I look in the OpenCL driver's directory, I see:

     $ ls *tbb* /opt/intel/opencl-1.2-3.0.56860/lib64
     libtbbmalloc.so libtbbmalloc.so.2 libtbb_preview.so libtbb_preview.so.2

So I think what's happening is that since my own built libtbb.so.2 is dynamically loaded first before initializing OpenCL, it is picking up the wrong TBB library symbol, thus causing a deadlock.

Now, while I could probably do tricks with versioned symbols and what not, I don't think it will help. Don't we want the OpenCL to use the same TBB scheduler as the rest of my application? Are there any good solutions to this?

#0 0x00007fffe5ecf477 in sched_yield () at ../sysdeps/unix/syscall-template.S:82
#1 0x00007fffe8fda29f in tbb::internal::custom_scheduler<tbb::internal::IntelSchedulerTraits>::receive_or_steal_task(long&, bool) ()
   from /D/dev/projects/sdk/MyApp/bin/../dsolib/libtbb.so.2
#2 0x00007fffe8fdb296 in tbb::internal::custom_scheduler<tbb::internal::IntelSchedulerTraits>::local_wait_for_all(tbb::task&, tbb::task*) ()
   from /D/dev/projects/sdk/MyApp/bin/../dsolib/libtbb.so.2
#3 0x00007fffc1fe7034 in tbb::interface6::internal::delegated_function<Intel::OpenCL::TaskExecutor::base_command_list::TaskGroupWaiter>::run() ()
   from /opt/intel/opencl-1.2-3.0.56860/lib64/libtask_executor.so
#4 0x00007fffe14ab7f8 in tbb::interface6::task_arena::internal_execute(tbb::interface6::internal::delegate_base&) const ()
   from /opt/intel/opencl-1.2-3.0.56860/lib64/libtbb_preview.so.2
#5 0x00007fffc1fe4710 in Intel::OpenCL::TaskExecutor::base_command_list::Wait() () from /opt/intel/opencl-1.2-3.0.56860/lib64/libtask_executor.so
#6 0x00007fffc1fe47d5 in Intel::OpenCL::TaskExecutor::base_command_list::WaitForCompletion(Intel::OpenCL::Utils::SharedPtr<Intel::OpenCL::TaskExecutor::ITaskBase> const&)
    () from /opt/intel/opencl-1.2-3.0.56860/lib64/libtask_executor.so
#7 0x00007fffc1fe955c in Intel::OpenCL::TaskExecutor::TBBTaskExecutor::WaitForCompletion(Intel::OpenCL::TaskExecutor::ITaskBase*, void*) ()
   from /opt/intel/opencl-1.2-3.0.56860/lib64/libtask_executor.so
#8 0x00007fffc1b8acfd in Intel::OpenCL::CPUDevice::TaskDispatcher::init() () from /opt/intel/opencl-1.2-3.0.56860/lib64/libcpu_device.so
#9 0x00007fffc1b96146 in Intel::OpenCL::CPUDevice::CPUDevice::Init() () from /opt/intel/opencl-1.2-3.0.56860/lib64/libcpu_device.so
#10 0x00007fffc1b96443 in clDevCreateDeviceInstance () from /opt/intel/opencl-1.2-3.0.56860/lib64/libcpu_device.so
#11 0x00007fffc319471e in Intel::OpenCL::Framework::Device::CreateInstance() () from /opt/intel/opencl-1.2-3.0.56860/lib64/libintelocl.so
#12 0x00007fffc31513ac in Intel::OpenCL::Framework::Context::Context(long const*, unsigned int, unsigned int, Intel::OpenCL::Utils::SharedPtr<Intel::OpenCL::Framework::FissionableDevice>*, void (*)(char const*, void const*, unsigned long, void*), void*, int*, ocl_entry_points*, ocl_gpa_data*, Intel::OpenCL::Framework::ContextModule const&)
    () from /opt/intel/opencl-1.2-3.0.56860/lib64/libintelocl.so
#13 0x00007fffc31369d3 in Intel::OpenCL::Framework::ContextModule::CreateContext(long const*, unsigned int, _cl_device_id* const*, void (*)(char const*, void const*, unsigned long, void*), void*, int*) () from /opt/intel/opencl-1.2-3.0.56860/lib64/libintelocl.so
#14 0x00007fffc312d09f in Intel::OpenCL::Framework::ContextModule::CreateContextFromType(long const*, unsigned long, void (*)(char const*, void const*, unsigned long, void*), void*, int*) () from /opt/intel/opencl-1.2-3.0.56860/lib64/libintelocl.so
#15 0x00007fffc311e57c in clCreateContextFromType () from /opt/intel/opencl-1.2-3.0.56860/lib64/libintelocl.so

Thanks,
-Edward

0 Kudos
3 Replies
Alexey-Kukanov
Employee
296 Views

Hello Edward,

we are aware of this composability problem, and together with the OpenCL team think how to solve it in the best possible way. The OpenCL runtime uses some advanced TBB features only available as community preview features thus far, thus uses special TBB binaries. And you are correct that some symbols are resolved into the "regular" TBB shared lib loaded by your application.

As a workaround, I may suggest you to try using the same library that OpenCL SDK needs, i.e. libtbb_preview.so.2. You may look at the version used by the SDK by calling 'strings  /opt/intel/opencl-1.2-3.0.56860/lib64/libtbb_preview.so.2 | grep TBB:'; better if you use the same or newer TBB version. Then you use libtbb_preview.so[.2] everywhere in your app build instead of libtbb.so[.2]. In my understanding, even if OpenCL runtime will load the TBB lib from their directory, all the symbols they need will be resolved to the pre-loaded copy used by your app.

I will be very interested to hear back from you if that helps, and if not - what problems you see there. We understand that such a workaround is not desirable in the long term, and will seek more reliable solutions. One sure thing is that Intel's OpenCL runtime will eventually use the "regular" TBB binaries instead of preview ones, once the features they use are finalized and drop the preview status.

0 Kudos
John_L_2
Beginner
296 Views

Just an update, we had some success setting

LD_PRELOAD=/opt/intel/opencl-1.2-3.0.67279/lib64/libtbb_preview.so.2

before running our application to force it to use the same library as the OpenCL 2013 driver.  This worked, although obviously is not a viable long term solution, nor one we can recommend to customers.

The OpenCL driver itself showed excellent performance, in many cases beating our regular tbb-based CPU path on the same operations.  We did run into fairly slow kernel creation from cached clProgram objects, apparently because the driver is re-loading the llvm bitcode on kernel creation (and we call a lot of kernels.)  Once we started caching the kernel objects as well we regained very good parallel performance.

Any estimates on when the necessary tbb preview features will make their way into the tbb release version?

0 Kudos
Vladimir_P_1234567890
296 Views

libtbb_preview.* has been changed to regular binaries for a while.

but do not forget to use the latest libtbb.so.2 library in your applications.

--Vladimir

0 Kudos
Reply