Community
cancel
Showing results for 
Search instead for 
Did you mean: 
mvrht__u3425923
Beginner
127 Views

Intel TBB segfaults after updating Ubuntu kernel to 4.13

TBB version: 2018 initial release

I am using OpenCV library compiled with TBB threading framework though python interface. After updating Kubuntu 16.04 kernel from 4.10 to 4.13 version, my code stopped working with segfault right in "import cv2" line. I tried to recompile OpenCV with TBB having new kernel ending up with the same problem. After recompiling with  OpenMP instead of TBB problem disappears. It seems that 2018 Update 2 has the same problem.

Interestingly when trying to simply start python interpreter "import cv2" and perform same operations as in code in "live" mode it has no problems.

Backtrace:

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff4155659 in std::rethrow_exception(std::__exception_ptr::exception_ptr) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
(gdb) bt
#0  0x00007ffff4155659 in std::rethrow_exception(std::__exception_ptr::exception_ptr) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#1  0x00007fff9ea3cc38 in tbb::internal::gcc_rethrow_exception_broken () at ../../src/tbb/tbb_misc.cpp:179
#2  0x00007fff9ea3f5cb in tbb::internal::governor::acquire_resources () at ../../src/tbb/governor.cpp:80
#3  0x00007fff9ea4bcc0 in tbb::internal::__TBB_InitOnce::__TBB_InitOnce (this=<optimized out>) at ../../src/tbb/tbb_main.h:71
#4  __sti___ZN48_INTERNAL_26_______src_tbb_tbb_main_cpp_ca9dcbe33tbb8internal28__TBB_InitOnceHiddenInstanceE () at ../../src/tbb/tbb_main.cpp:71
#5  __sti__$E () at ../../src/tbb/tbb_main.cpp:52
#6  0x00007ffff7de76ba in call_init (l=<optimized out>, argc=argc@entry=11, argv=argv@entry=0x7fffffffdb08, env=env@entry=0xf2b750) at dl-init.c:72
#7  0x00007ffff7de77cb in call_init (env=0xf2b750, argv=0x7fffffffdb08, argc=11, l=<optimized out>) at dl-init.c:30
#8  _dl_init (main_map=main_map@entry=0x1206550, argc=11, argv=0x7fffffffdb08, env=0xf2b750) at dl-init.c:120
#9  0x00007ffff7dec8e2 in dl_open_worker (a=a@entry=0x7fffffffa810) at dl-open.c:575
#10 0x00007ffff7de7564 in _dl_catch_error (objname=objname@entry=0x7fffffffa800, errstring=errstring@entry=0x7fffffffa808, mallocedp=mallocedp@entry=0x7fffffffa7ff, operate=operate@entry=0x7ffff7dec4d0 <dl_open_worker>, 
    args=args@entry=0x7fffffffa810) at dl-error.c:187
#11 0x00007ffff7debda9 in _dl_open (file=0x7fffa27178a0 "/usr/local/lib/python3.5/dist-packages/cv2.cpython-35m-x86_64-linux-gnu.so", mode=-2147483646, caller_dlopen=0x60b35a <_PyImport_FindSharedFuncptr+138>, nsid=-2, 
    argc=<optimized out>, argv=<optimized out>, env=0xf2b750) at dl-open.c:660
#12 0x00007ffff75ecf09 in dlopen_doit (a=a@entry=0x7fffffffaa40) at dlopen.c:66
#13 0x00007ffff7de7564 in _dl_catch_error (objname=0xbef2d0, errstring=0xbef2d8, mallocedp=0xbef2c8, operate=0x7ffff75eceb0 <dlopen_doit>, args=0x7fffffffaa40) at dl-error.c:187
#14 0x00007ffff75ed571 in _dlerror_run (operate=operate@entry=0x7ffff75eceb0 <dlopen_doit>, args=args@entry=0x7fffffffaa40) at dlerror.c:163
#15 0x00007ffff75ecfa1 in __dlopen (file=<optimized out>, mode=<optimized out>) at dlopen.c:87
#16 0x000000000060b35a in _PyImport_FindSharedFuncptr ()
#17 0x000000000061000b in _PyImport_LoadDynamicModuleWithSpec ()
#18 0x0000000000610538 in ?? ()
#19 0x00000000004e9c36 in PyCFunction_Call ()
#20 0x000000000053dbbb in PyEval_EvalFrameEx ()
#21 0x0000000000540199 in ?? ()
#22 0x000000000053c1d0 in PyEval_EvalFrameEx ()
#23 0x000000000053b7e4 in PyEval_EvalFrameEx ()
#24 0x000000000053b7e4 in PyEval_EvalFrameEx ()
#25 0x000000000053b7e4 in PyEval_EvalFrameEx ()
#26 0x000000000053b7e4 in PyEval_EvalFrameEx ()
#27 0x0000000000540f9b in PyEval_EvalCodeEx ()
#28 0x00000000004ebd23 in ?? ()
#29 0x00000000005c1797 in PyObject_Call ()
#30 0x00000000005c257a in _PyObject_CallMethodIdObjArgs ()
#31 0x00000000005260c8 in PyImport_ImportModuleLevelObject ()
#32 0x0000000000549e78 in ?? ()
#33 0x00000000004e9ba7 in PyCFunction_Call ()
#34 0x00000000005c1797 in PyObject_Call ()
#35 0x0000000000534d90 in PyEval_CallObjectWithKeywords ()
#36 0x000000000053a1c7 in PyEval_EvalFrameEx ()
#37 0x0000000000540199 in ?? ()
#38 0x0000000000540e4f in PyEval_EvalCode ()
#39 0x000000000054a6b8 in ?? ()
#40 0x00000000004e9c36 in PyCFunction_Call ()
#41 0x000000000053dbbb in PyEval_EvalFrameEx ()
#42 0x0000000000540199 in ?? ()
#43 0x000000000053c1d0 in PyEval_EvalFrameEx ()
#44 0x000000000053b7e4 in PyEval_EvalFrameEx ()
#45 0x000000000053b7e4 in PyEval_EvalFrameEx ()
#46 0x000000000053b7e4 in PyEval_EvalFrameEx ()
#47 0x0000000000540f9b in PyEval_EvalCodeEx ()
#48 0x00000000004ebd23 in ?? ()
#49 0x00000000005c1797 in PyObject_Call ()
#50 0x00000000005c257a in _PyObject_CallMethodIdObjArgs ()
#51 0x00000000005260c8 in PyImport_ImportModuleLevelObject ()
#52 0x0000000000549e78 in ?? ()
#53 0x00000000004e9ba7 in PyCFunction_Call ()
#54 0x00000000005c1797 in PyObject_Call ()
#55 0x0000000000534d90 in PyEval_CallObjectWithKeywords ()
#56 0x000000000053a1c7 in PyEval_EvalFrameEx ()
#57 0x0000000000540199 in ?? ()
#58 0x0000000000540e4f in PyEval_EvalCode ()
#59 0x000000000054a6b8 in ?? ()
#60 0x00000000004e9c36 in PyCFunction_Call ()
#61 0x000000000053dbbb in PyEval_EvalFrameEx ()
#62 0x0000000000540199 in ?? ()
---Type <return> to continue, or q <return> to quit---
#63 0x000000000053c1d0 in PyEval_EvalFrameEx ()
#64 0x000000000053b7e4 in PyEval_EvalFrameEx ()
#65 0x000000000053b7e4 in PyEval_EvalFrameEx ()
#66 0x000000000053b7e4 in PyEval_EvalFrameEx ()
#67 0x0000000000540f9b in PyEval_EvalCodeEx ()
#68 0x00000000004ebd23 in ?? ()
#69 0x00000000005c1797 in PyObject_Call ()
#70 0x00000000005c257a in _PyObject_CallMethodIdObjArgs ()
#71 0x00000000005260c8 in PyImport_ImportModuleLevelObject ()
#72 0x0000000000549e78 in ?? ()
#73 0x00000000004e9ba7 in PyCFunction_Call ()
#74 0x00000000005c1797 in PyObject_Call ()
#75 0x0000000000534d90 in PyEval_CallObjectWithKeywords ()
#76 0x000000000053a1c7 in PyEval_EvalFrameEx ()
#77 0x0000000000540199 in ?? ()
#78 0x0000000000540e4f in PyEval_EvalCode ()
#79 0x000000000060c272 in ?? ()
#80 0x000000000060e71a in PyRun_FileExFlags ()
#81 0x000000000060ef0c in PyRun_SimpleFileExFlags ()
#82 0x000000000063fb26 in Py_Main ()
#83 0x00000000004cfeb1 in main ()

 

0 Kudos
8 Replies
mvrht__u3425923
Beginner
127 Views

Ok, I recompiled TBB from source with gcc5.4 instead of using pre-compiled and problem disappeared

Alexei_K_Intel
Employee
127 Views

The function tbb::internal::gcc_rethrow_exception_thrown tries to detect an issue in exception support in some versions of libstdc++. For this reason the function rethrows an exception in a catch block. In accordance with the backtrace, it causes a crash inside libstdc++. This function is called always when the TBB library is initialized. It does not depend on environment (e.g. python) and other factors. Therefore, any TBB usage should cause the issue.

The fact that the recompilation resolves the issue looks weird. I have the only supposition that you recompiled the TBB library without C++11 support. Did you specify stdver=c++11 for the make invocation?

Could you share the libstdc++ version installed on your system, please? E.g. output of the following command:

strings /usr/lib/x86_64-linux-gnu/libstdc++.so.6 | grep GLIBCXX

Regards,
Alex

mvrht__u3425923
Beginner
127 Views

Hi Alex,

yes, I haven't used this flag. I will try to recompile with it

strings /usr/lib/x86_64-linux-gnu/libstdc++.so.6 | grep GLIBCXX
GLIBCXX_3.4
GLIBCXX_3.4.1
GLIBCXX_3.4.2
GLIBCXX_3.4.3
GLIBCXX_3.4.4
GLIBCXX_3.4.5
GLIBCXX_3.4.6
GLIBCXX_3.4.7
GLIBCXX_3.4.8
GLIBCXX_3.4.9
GLIBCXX_3.4.10
GLIBCXX_3.4.11
GLIBCXX_3.4.12
GLIBCXX_3.4.13
GLIBCXX_3.4.14
GLIBCXX_3.4.15
GLIBCXX_3.4.16
GLIBCXX_3.4.17
GLIBCXX_3.4.18
GLIBCXX_3.4.19
GLIBCXX_3.4.20
GLIBCXX_3.4.21
GLIBCXX_DEBUG_MESSAGE_LENGTH

 

mvrht__u3425923
Beginner
127 Views

You are right, Alex, building using make all stdver=c++11 I finished with TEST PASSED but with reappeared segmentation fault when I am trying to run my code

If it is helpful I attach back trace (which is slightly different):

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff4155659 in std::rethrow_exception(std::__exception_ptr::exception_ptr) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
(gdb) bt
#0  0x00007ffff4155659 in std::rethrow_exception(std::__exception_ptr::exception_ptr) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#1  0x00007fff9ef5ea32 in tbb::internal::gcc_rethrow_exception_broken () at ../../src/tbb/tbb_misc.cpp:179
#2  0x00007fff9ef60c5c in tbb::internal::governor::acquire_resources () at ../../src/tbb/governor.cpp:80
#3  0x00007fff9ef6b7c5 in tbb::internal::__TBB_InitOnce::add_ref () at ../../src/tbb/tbb_main.cpp:122
#4  0x00007fff9ef51353 in tbb::internal::__TBB_InitOnce::__TBB_InitOnce (this=0x7fff9f17f3a0 <tbb::internal::__TBB_InitOnceHiddenInstance>) at ../../src/tbb/tbb_main.h:71
#5  __static_initialization_and_destruction_0 (__initialize_p=1, __priority=65535) at ../../src/tbb/tbb_main.cpp:71
#6  _GLOBAL__sub_I_tbb_main.cpp(void) () at ../../src/tbb/tbb_main.cpp:563
#7  0x00007ffff7de76ba in call_init (l=<optimized out>, argc=argc@entry=11, argv=argv@entry=0x7fffffffdb08, env=env@entry=0xf2ba70) at dl-init.c:72
#8  0x00007ffff7de77cb in call_init (env=0xf2ba70, argv=0x7fffffffdb08, argc=11, l=<optimized out>) at dl-init.c:30
#9  _dl_init (main_map=main_map@entry=0x1205d50, argc=11, argv=0x7fffffffdb08, env=0xf2ba70) at dl-init.c:120
#10 0x00007ffff7dec8e2 in dl_open_worker (a=a@entry=0x7fffffffa810) at dl-open.c:575
#11 0x00007ffff7de7564 in _dl_catch_error (objname=objname@entry=0x7fffffffa800, errstring=errstring@entry=0x7fffffffa808, mallocedp=mallocedp@entry=0x7fffffffa7ff, operate=operate@entry=0x7ffff7dec4d0 <dl_open_worker>, 
    args=args@entry=0x7fffffffa810) at dl-error.c:187
#12 0x00007ffff7debda9 in _dl_open (file=0x7fffa2c1d8a0 "/usr/local/lib/python3.5/dist-packages/cv2.cpython-35m-x86_64-linux-gnu.so", mode=-2147483646, caller_dlopen=0x60b35a <_PyImport_FindSharedFuncptr+138>, nsid=-2, 
    argc=<optimized out>, argv=<optimized out>, env=0xf2ba70) at dl-open.c:660
#13 0x00007ffff75ecf09 in dlopen_doit (a=a@entry=0x7fffffffaa40) at dlopen.c:66
#14 0x00007ffff7de7564 in _dl_catch_error (objname=0xbc5d80, errstring=0xbc5d88, mallocedp=0xbc5d78, operate=0x7ffff75eceb0 <dlopen_doit>, args=0x7fffffffaa40) at dl-error.c:187
#15 0x00007ffff75ed571 in _dlerror_run (operate=operate@entry=0x7ffff75eceb0 <dlopen_doit>, args=args@entry=0x7fffffffaa40) at dlerror.c:163
#16 0x00007ffff75ecfa1 in __dlopen (file=<optimized out>, mode=<optimized out>) at dlopen.c:87
#17 0x000000000060b35a in _PyImport_FindSharedFuncptr ()
#18 0x000000000061000b in _PyImport_LoadDynamicModuleWithSpec ()
#19 0x0000000000610538 in ?? ()
#20 0x00000000004e9c36 in PyCFunction_Call ()
#21 0x000000000053dbbb in PyEval_EvalFrameEx ()
#22 0x0000000000540199 in ?? ()
#23 0x000000000053c1d0 in PyEval_EvalFrameEx ()
#24 0x000000000053b7e4 in PyEval_EvalFrameEx ()
#25 0x000000000053b7e4 in PyEval_EvalFrameEx ()
#26 0x000000000053b7e4 in PyEval_EvalFrameEx ()
#27 0x000000000053b7e4 in PyEval_EvalFrameEx ()
#28 0x0000000000540f9b in PyEval_EvalCodeEx ()
#29 0x00000000004ebd23 in ?? ()
#30 0x00000000005c1797 in PyObject_Call ()
#31 0x00000000005c257a in _PyObject_CallMethodIdObjArgs ()
#32 0x00000000005260c8 in PyImport_ImportModuleLevelObject ()
#33 0x0000000000549e78 in ?? ()
#34 0x00000000004e9ba7 in PyCFunction_Call ()
#35 0x00000000005c1797 in PyObject_Call ()
#36 0x0000000000534d90 in PyEval_CallObjectWithKeywords ()
#37 0x000000000053a1c7 in PyEval_EvalFrameEx ()
#38 0x0000000000540199 in ?? ()
#39 0x0000000000540e4f in PyEval_EvalCode ()
#40 0x000000000054a6b8 in ?? ()
#41 0x00000000004e9c36 in PyCFunction_Call ()
#42 0x000000000053dbbb in PyEval_EvalFrameEx ()
#43 0x0000000000540199 in ?? ()
#44 0x000000000053c1d0 in PyEval_EvalFrameEx ()
#45 0x000000000053b7e4 in PyEval_EvalFrameEx ()
#46 0x000000000053b7e4 in PyEval_EvalFrameEx ()
#47 0x000000000053b7e4 in PyEval_EvalFrameEx ()
#48 0x0000000000540f9b in PyEval_EvalCodeEx ()
#49 0x00000000004ebd23 in ?? ()
#50 0x00000000005c1797 in PyObject_Call ()
#51 0x00000000005c257a in _PyObject_CallMethodIdObjArgs ()
#52 0x00000000005260c8 in PyImport_ImportModuleLevelObject ()
#53 0x0000000000549e78 in ?? ()
#54 0x00000000004e9ba7 in PyCFunction_Call ()
#55 0x00000000005c1797 in PyObject_Call ()
#56 0x0000000000534d90 in PyEval_CallObjectWithKeywords ()
#57 0x000000000053a1c7 in PyEval_EvalFrameEx ()
#58 0x0000000000540199 in ?? ()
#59 0x0000000000540e4f in PyEval_EvalCode ()
#60 0x000000000054a6b8 in ?? ()
#61 0x00000000004e9c36 in PyCFunction_Call ()
#62 0x000000000053dbbb in PyEval_EvalFrameEx ()
---Type <return> to continue, or q <return> to quit---
#63 0x0000000000540199 in ?? ()
#64 0x000000000053c1d0 in PyEval_EvalFrameEx ()
#65 0x000000000053b7e4 in PyEval_EvalFrameEx ()
#66 0x000000000053b7e4 in PyEval_EvalFrameEx ()
#67 0x000000000053b7e4 in PyEval_EvalFrameEx ()
#68 0x0000000000540f9b in PyEval_EvalCodeEx ()
#69 0x00000000004ebd23 in ?? ()
#70 0x00000000005c1797 in PyObject_Call ()
#71 0x00000000005c257a in _PyObject_CallMethodIdObjArgs ()
#72 0x00000000005260c8 in PyImport_ImportModuleLevelObject ()
#73 0x0000000000549e78 in ?? ()
#74 0x00000000004e9ba7 in PyCFunction_Call ()
#75 0x00000000005c1797 in PyObject_Call ()
#76 0x0000000000534d90 in PyEval_CallObjectWithKeywords ()
#77 0x000000000053a1c7 in PyEval_EvalFrameEx ()
#78 0x0000000000540199 in ?? ()
#79 0x0000000000540e4f in PyEval_EvalCode ()
#80 0x000000000060c272 in ?? ()
#81 0x000000000060e71a in PyRun_FileExFlags ()
#82 0x000000000060ef0c in PyRun_SimpleFileExFlags ()
#83 0x000000000063fb26 in Py_Main ()
#84 0x00000000004cfeb1 in main ()

 

Alexei_K_Intel
Employee
127 Views

Could you run one of TBB examples to understand if the issue is specific for your application or it affects multiple applications. E.g. to run parallel_for/seismic example:

cd <path>/tbb2018_20171205oss
. /bin/tbbvars.sh intel64 linux auto_tbbroot
cd examples/parallel_for/seismic
make

In addition, check that the same TBB and libstdc++ version are used:

ldd seismic

Regards,
Alex
 

mvrht__u3425923
Beginner
127 Views

seismic works.

After some investigation I found that the smallest snippet to get segfault is

from google.protobuf import descriptor
import torch
import cv2

Where protobuf is Google library (python package build with c++ library linkage) and torch is PyTorch package (links many libraries including Cuda, MKL). Even after simple changing the order of imports or removing one of protobuf or torch imports problem disappears. I feel that problem in some internal libraries dependencies conflict but I am completely lost: it was working with 4.10 kernel, it works with TBB compiled without C++11 support, it works with OpenCV compiled with OpenMP insted of TBB.

mvrht__u3425923
Beginner
127 Views

Well, it seems that problem is in libstdc++ if I invoke with:

LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libstdc++.so.6 ...

then there is no segmentation fault. However I am not sure what to do next, how to search for exact troublemaker and how to fix the issue without preloading libstdc++ each time. Maybe someone could be so kind and point me directions?

I also attached output of runnig with LD_DEBUG=files flag, which shows that the same libstdc++ library is loaded in both cases

614713

614714

mvrht__u3425923
Beginner
127 Views

In case someone will also have such problem - real cause was Nvidia NCCL: https://devtalk.nvidia.com/default/topic/1030417/

Reply