I am having some trouble running my OpenCL code on Intel CPUs and GPUs. The code is a Monte Carlo simulation code, and can be downloaded from https://github.com/fangq/mcxcl
The code has no issue when running on NVIDIA ocl. However, when running on Intel ocl (CPU or HD GPU), it hangs almost every time when running a large number of photons.
The kernel has only one while() loop that may generate such hanging behavior:
however, I set a counter and force that loop to quit when exceeding a limit, but that did not stop the hanging. It appears to me that something else must be responsible. I am now suspecting the clWaitForEvents at this line fails to return for some reason:
what else I can do to debug this issue? could it be possible that the kernel had completed execution, but clWaitForEvents stalled?
any suggestion is welcome! thanks in advance!
PS: you can reproduce this issue by following the below commands:
git clone https://github.com/fangq/mcxcl.git cd mcxcl/src make cd ../example/quicktest ../../bin/mcxcl -t 128 -T 8 -g 10 -n 1e7 -f qtest.inp -s qtest -r 1 -a 0 -b 0 -k ../../src/mcx_core.cl -d 1 -G 1
without hanging, the program is supposed to finish in about 15-25 seconds on an Intel CPU/GPU. If it hangs, you may lower the number after -n. Lower than 1e6 typically will work without hanging.
I did some further testing, it looks like the hanging only happens when using Intel's HD graphics (HD 5500). The Intel CPU backend runs fine.
I found how to use gdb to debug ocl code on Intel CPU. But this method does not work for HD graphics. When I press Ctrl+C after program hangs, gdb always give me the following
fangq@dayu:/Project/github/mcxcl/example/quicktest$ ../../bin/mcxcl -L Platform  Name Intel(R) OpenCL ============ GPU device ID 1 [1 of 1]: Intel(R) HD Graphics ============ Compute units : 23 core(s) Global memory : 6587885159 B Local memory : 65536 B Constant memory : 1646971289 B Clock speed : 950 MHz Starting program: /Project/github/mcxcl/bin/mcxcl -t 12800 -T 64 -g 10 -n 5e6 -f qtest.inp -s qtest -r 1 -a 0 -b 1 -k ../../src/mcx_core.cl -d 1 -G 1 -J -g\ -s\ /home/fangq/space/Gitroot/Project/github/mcxcl/src/mcx_core.cl [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". =============================================================================== = Monte Carlo eXtreme (MCX) -- OpenCL = = Copyright (c) 2009-2016 Qianqian Fang <q.fang at neu.edu> = = = = Computational Imaging Laboratory (CIL) = = Department of Bioengineering, Northeastern University = =============================================================================== $MCXCL$Rev:: $ Last Commit $Date:: $ by $Author:: fangq$ =============================================================================== - variant name: [Detective MCXCL] compiled with OpenCL version  - compiled with: [RNG] Logistic-Lattice [Seed Length] 5 initializing streams ... init complete : 0 ms build program complete : 584 ms - [device 0] threadph=390 oddphotons=8000 np=5000000.0 nthread=12800 repetition=1 set kernel arguments complete : 584 ms lauching mcx_main_loop for time window [0.0ns 5.0ns] ... simulation run# 1 ... ^C Program received signal SIGINT, Interrupt. 0x00007ffff6dec816 in __pthread_mutex_unlock_usercnt (decr=1, mutex=0x895f70) at pthread_mutex_unlock.c:73 73 pthread_mutex_unlock.c: No such file or directory. (gdb) where #0 0x00007ffff6dec816 in __pthread_mutex_unlock_usercnt (decr=1, mutex=0x895f70) at pthread_mutex_unlock.c:73 #1 __GI___pthread_mutex_unlock (mutex=0x895f70) at pthread_mutex_unlock.c:310 #2 0x00007ffff6206089 in ?? () from /opt/intel/opencl/libigdrcl.so #3 0x00007ffff613039f in ?? () from /opt/intel/opencl/libigdrcl.so #4 0x00007ffff612cf4e in ?? () from /opt/intel/opencl/libigdrcl.so #5 0x00007ffff612d8f8 in ?? () from /opt/intel/opencl/libigdrcl.so #6 0x00007ffff6135013 in ?? () from /opt/intel/opencl/libigdrcl.so #7 0x00007ffff6149495 in ?? () from /opt/intel/opencl/libigdrcl.so #8 0x00007ffff6156d84 in ?? () from /opt/intel/opencl/libigdrcl.so #9 0x00007ffff61b5203 in ?? () from /opt/intel/opencl/libigdrcl.so #10 0x00007ffff7bd5813 in clWaitForEvents () from /opt/intel/opencl/libOpenCL.so.1 #11 0x00000000004043c0 in mcx_run_simulation (cfg=<optimized out>, fluence=<optimized out>, totalenergy=<optimized out>) at mcx_host.cpp:443 #12 0x00000000004018ab in main (argc=27, argv=0x7fffffffdc18) at mcxcl.c:35
I can't tell whether the kernel was stuck somewhere or simply the clWaitForEvents failed.
any other ways for debugging HD graphics?