OpenCL* for CPU
Ask questions and share information on Intel® SDK for OpenCL™ Applications and OpenCL™ implementations for Intel® CPU
Announcements
This forum covers OpenCL* for CPU only. OpenCL* for GPU questions can be asked in the GPU Compute Software forum. Intel® FPGA SDK for OpenCL™ questions can be ask in the FPGA Intel® High Level Design forum.
1663 Discussions

Intel OpenCL kernel hangs - how to debug?

QFang1
Novice
537 Views

I am having some trouble running my OpenCL code on Intel CPUs and GPUs. The code is a Monte Carlo simulation code, and can be downloaded from https://github.com/fangq/mcxcl

The code has no issue when running on NVIDIA ocl. However, when running on Intel ocl (CPU or HD GPU), it hangs almost every time when running a large number of photons. 

The kernel has only one while() loop that may generate such hanging behavior:

https://github.com/fangq/mcxcl/blob/master/src/mcx_core.cl#L373

however, I set a counter and force that loop to quit when exceeding a limit, but that did not stop the hanging. It appears to me that something else must be responsible. I am now suspecting the clWaitForEvents at this line fails to return for some reason:

https://github.com/fangq/mcxcl/blob/master/src/mcx_host.cpp#L443

what else I can do to debug this issue? could it be possible that the kernel had completed execution, but clWaitForEvents stalled? 

any suggestion is welcome! thanks in advance!

 

PS: you can reproduce this issue by following the below commands:

git clone https://github.com/fangq/mcxcl.git
cd mcxcl/src
make
cd ../example/quicktest
../../bin/mcxcl  -t 128 -T 8 -g 10 -n 1e7 -f qtest.inp -s qtest -r 1 -a 0 -b 0 -k ../../src/mcx_core.cl -d 1 -G 1

without hanging, the program is supposed to finish in about 15-25 seconds on an Intel CPU/GPU. If it hangs, you may lower the number after -n. Lower than 1e6 typically will work without hanging.

0 Kudos
2 Replies
QFang1
Novice
537 Views

I did some further testing, it looks like the hanging only happens when using Intel's HD graphics (HD 5500). The Intel CPU backend runs fine.

I found how to use gdb to debug ocl code on Intel CPU. But this method does not work for HD graphics. When I press Ctrl+C after program hangs, gdb always give me the following 

fangq@dayu:/Project/github/mcxcl/example/quicktest$ ../../bin/mcxcl -L
Platform [0] Name Intel(R) OpenCL
============ GPU device ID 1 [1 of 1]: Intel(R) HD Graphics ============
 Compute units   :    23 core(s)
 Global memory   :    6587885159 B
 Local memory    :    65536 B
 Constant memory :    1646971289 B
 Clock speed     :    950 MHz 

Starting program: /Project/github/mcxcl/bin/mcxcl -t 12800 -T 64 -g 10 -n 5e6 -f qtest.inp -s qtest -r 1 -a 0 -b 1 -k ../../src/mcx_core.cl -d 1 -G 1 -J -g\ -s\ /home/fangq/space/Gitroot/Project/github/mcxcl/src/mcx_core.cl
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
===============================================================================
=                     Monte Carlo eXtreme (MCX) -- OpenCL                     =
=           Copyright (c) 2009-2016 Qianqian Fang <q.fang at neu.edu>         =
= =
=                    Computational Imaging Laboratory (CIL)                   =
=             Department of Bioengineering, Northeastern University           =
===============================================================================
$MCXCL$Rev::    $ Last Commit $Date::                     $ by $Author:: fangq$
===============================================================================
- variant name: [Detective MCXCL] compiled with OpenCL version [1]
- compiled with: [RNG] Logistic-Lattice [Seed Length] 5
initializing streams ...    init complete : 0 ms
build program complete : 584 ms
- [device 0] threadph=390 oddphotons=8000 np=5000000.0 nthread=12800 repetition=1
set kernel arguments complete : 584 ms
lauching mcx_main_loop for time window [0.0ns 5.0ns] ...
simulation run# 1 ...

^C
Program received signal SIGINT, Interrupt.
0x00007ffff6dec816 in __pthread_mutex_unlock_usercnt (decr=1, mutex=0x895f70) at pthread_mutex_unlock.c:73
73    pthread_mutex_unlock.c: No such file or directory.
(gdb) where
#0  0x00007ffff6dec816 in __pthread_mutex_unlock_usercnt (decr=1, mutex=0x895f70) at pthread_mutex_unlock.c:73
#1  __GI___pthread_mutex_unlock (mutex=0x895f70) at pthread_mutex_unlock.c:310
#2  0x00007ffff6206089 in ?? () from /opt/intel/opencl/libigdrcl.so
#3  0x00007ffff613039f in ?? () from /opt/intel/opencl/libigdrcl.so
#4  0x00007ffff612cf4e in ?? () from /opt/intel/opencl/libigdrcl.so
#5  0x00007ffff612d8f8 in ?? () from /opt/intel/opencl/libigdrcl.so
#6  0x00007ffff6135013 in ?? () from /opt/intel/opencl/libigdrcl.so
#7  0x00007ffff6149495 in ?? () from /opt/intel/opencl/libigdrcl.so
#8  0x00007ffff6156d84 in ?? () from /opt/intel/opencl/libigdrcl.so
#9  0x00007ffff61b5203 in ?? () from /opt/intel/opencl/libigdrcl.so
#10 0x00007ffff7bd5813 in clWaitForEvents () from /opt/intel/opencl/libOpenCL.so.1
#11 0x00000000004043c0 in mcx_run_simulation (cfg=<optimized out>, fluence=<optimized out>, totalenergy=<optimized out>) at mcx_host.cpp:443
#12 0x00000000004018ab in main (argc=27, argv=0x7fffffffdc18) at mcxcl.c:35 

I can't tell whether the kernel was stuck somewhere or simply the clWaitForEvents failed. 

any other ways for debugging HD graphics?

Jeffrey_M_Intel1
Employee
537 Views

Replicated in this thread: https://software.intel.com/en-us/forums/opencl/topic/676013

Will get back to you with more info on debugging on that thread soon.

 

 

Reply