- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have a multithread program which uses concurrent_unordered_map (latest tbb version, built from src). It will crashes after several hours running. The core dump is
(gdb) where
#0 flist_iterator (other=..., this=<synthetic pointer>) at /usr/include/tbb/internal/_concurrent_unordered_impl.h:90
#1 get_bucket (bucket=<optimized out>, this=<optimized out>) at /usr/include/tbb/internal/_concurrent_unordered_impl.h:1333
#2 internal_insert (value=..., this=0x7f75a404e500) at /usr/include/tbb/internal/_concurrent_unordered_impl.h:1127
#3 insert (value=..., this=0x7f75a404e500) at /usr/include/tbb/internal/_concurrent_unordered_impl.h:860
#4 ngram_segment<20>::segment (this=this@entry=0x7f75a404e4f0, sentence=..., result=..., debug=debug@entry=false) at /usr/local/include/segment/segment.h:319
The code around 319 is:
{
auto it = dp_cache_.find(sentence);
if (it == dp_cache_.end()) {
dp_cache_.insert(make_pair(sentence, result));
}
}
And the types:
typedef pair<float, vector<int>> segmentation;
concurrent_unordered_map<string, segmentation> dp_cache_;
I tried concurrent_hash_map and different versions of tbb or pre-built binaries. They all crashes.
Anyone seeing similar cases?
Helps are appreciated!
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I just tried debug version of tbb libraries, and now it shows more information, while I still don't understand why.
(gdb) where
#0 0x00007f310ee37425 in __GI_raise (sig=<optimized out>) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1 0x00007f310ee3ab8b in __GI_abort () at abort.c:91
#2 0x00007f310a763234 in rml::internal::assertion_failure (filename=0x7f310a76f948 "../../src/tbbmalloc/frontend.cpp", line=1805, expression=0x7f310a76fc68 "allocatedCount < (slabSize-sizeof(Block))/objectSize", comment=0x0)
at ../../src/tbbmalloc/../tbb/tbb_assert_impl.h:84
#3 0x00007f310a76c9da in rml::internal::Block::allocateFromFreeList (this=0x7f310a4a0000) at ../../src/tbbmalloc/frontend.cpp:1805
#4 0x00007f310a76cb09 in rml::internal::Block::allocate (this=0x7f310a4a0000) at ../../src/tbbmalloc/frontend.cpp:1832
#5 0x00007f310a76a76a in rml::internal::internalPoolMalloc (memPool=0x7f310a978ea0 <rml::internal::defaultMemPool_space>, size=56) at ../../src/tbbmalloc/frontend.cpp:2239
#6 0x00007f310a76aae5 in rml::internal::internalMalloc (size=56) at ../../src/tbbmalloc/frontend.cpp:2324
#7 0x00007f310a76b076 in scalable_malloc (size=56) at ../../src/tbbmalloc/frontend.cpp:2566
#8 0x00007f310f9f1adb in tbb::internal::allocate_via_handler_v3 (n=56) at ../../src/tbb/cache_aligned_allocator.cpp:237
#9 0x000000000040a671 in allocate (this=<optimized out>, n=<optimized out>) at /usr/include/tbb/internal/../tbb_allocator.h:109
#10 create_node (order_key=<optimized out>, this=<optimized out>) at /usr/include/tbb/internal/_concurrent_unordered_impl.h:276
#11 insert_dummy (order_key=<optimized out>, it=..., this=0x10c3290) at /usr/include/tbb/internal/_concurrent_unordered_impl.h:510
#12 init_bucket (bucket=1812, this=0x10c3280) at /usr/include/tbb/internal/_concurrent_unordered_impl.h:1289
#13 tbb::interface5::internal::concurrent_unordered_base<tbb::interface5::concurrent_unordered_map_traits<std::string, std::pair<float, std::vector<int, std::allocator<int> > >, tbb::interface5::internal::hash_compare<std::string, tbb::tbb_hash<std::string>, std::equal_to<std::string> >, tbb::tbb_allocator<std::pair<std::string const, std::pair<float, std::vector<int, std::allocator<int> > > > >, false> >::init_bucket (this=0x10c3280, bucket=3860) at /usr/include/tbb/internal/_concurrent_unordered_impl.h:1284
#14 0x000000000040b5ba in init_bucket (bucket=7956, this=0x10c3280) at /usr/include/tbb/internal/_concurrent_unordered_impl.h:1284
#15 internal_find (key=..., this=0x10c3280) at /usr/include/tbb/internal/_concurrent_unordered_impl.h:1181
#16 find (key=..., this=0x10c3280) at /usr/include/tbb/internal/_concurrent_unordered_impl.h:921
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello, Benyu,
Thanks for the report. Can you tell me what kind of failure it is? The traceback is at the copy-construction of an flist_iterator as a return value, which is constructed on the stack, so it doesn't seem to be a heap failure. And the initialization of an element when inserting into the map should get a segfault if there is no memory available, but it should be in the split_ordered_list.
Do you have an idea of how many items are inserted into the map before the failure?
Regards,
Chris Huson
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Christopher,
I don't have the exact number at hand, but I believe it's less than 10K.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Christopher,
BTW, I tried to use the debug version of tbb libraries. Here is the core dump of another crash (this time is in find()) Wish it provide more information.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `build/segment_service -w segment_service/ -P 8003'.
Program terminated with signal 6, Aborted.
#0 0x00007f310ee37425 in __GI_raise (sig=<optimized out>) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
64 ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) where
#0 0x00007f310ee37425 in __GI_raise (sig=<optimized out>) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1 0x00007f310ee3ab8b in __GI_abort () at abort.c:91
#2 0x00007f310a763234 in rml::internal::assertion_failure (filename=0x7f310a76f948 "../../src/tbbmalloc/frontend.cpp", line=1805, expression=0x7f310a76fc68 "allocatedCount < (slabSize-sizeof(Block))/objectSize", comment=0x0) at ../../src/tbbmalloc/../tbb/tbb_assert_impl.h:84
#3 0x00007f310a76c9da in rml::internal::Block::allocateFromFreeList (this=0x7f310a4a0000) at ../../src/tbbmalloc/frontend.cpp:1805
#4 0x00007f310a76cb09 in rml::internal::Block::allocate (this=0x7f310a4a0000) at ../../src/tbbmalloc/frontend.cpp:1832
#5 0x00007f310a76a76a in rml::internal::internalPoolMalloc (memPool=0x7f310a978ea0 <rml::internal::defaultMemPool_space>, size=56) at ../../src/tbbmalloc/frontend.cpp:2239
#6 0x00007f310a76aae5 in rml::internal::internalMalloc (size=56) at ../../src/tbbmalloc/frontend.cpp:2324
#7 0x00007f310a76b076 in scalable_malloc (size=56) at ../../src/tbbmalloc/frontend.cpp:2566
#8 0x00007f310f9f1adb in tbb::internal::allocate_via_handler_v3 (n=56) at ../../src/tbb/cache_aligned_allocator.cpp:237
#9 0x000000000040a671 in allocate (this=<optimized out>, n=<optimized out>) at /usr/include/tbb/internal/../tbb_allocator.h:109
#10 create_node (order_key=<optimized out>, this=<optimized out>) at /usr/include/tbb/internal/_concurrent_unordered_impl.h:276
#11 insert_dummy (order_key=<optimized out>, it=..., this=0x10c3290) at /usr/include/tbb/internal/_concurrent_unordered_impl.h:510
#12 init_bucket (bucket=1812, this=0x10c3280) at /usr/include/tbb/internal/_concurrent_unordered_impl.h:1289
#13 tbb::interface5::internal::concurrent_unordered_base<tbb::interface5::concurrent_unordered_map_traits<std::string, std::pair<float, std::vector<int, std::allocator<int> > >, tbb::interface5::internal::hash_compare<std::string, tbb::tbb_hash<std::string>, std::equal_to<std::string> >, tbb::tbb_allocator<std::pair<std::string const, std::pair<float, std::vector<int, std::allocator<int> > > > >, false> >::init_bucket (this=0x10c3280, bucket=3860) at /usr/include/tbb/internal/_concurrent_unordered_impl.h:1284
#14 0x000000000040b5ba in init_bucket (bucket=7956, this=0x10c3280) at /usr/include/tbb/internal/_concurrent_unordered_impl.h:1284
#15 internal_find (key=..., this=0x10c3280) at /usr/include/tbb/internal/_concurrent_unordered_impl.h:1181
#16 find (key=..., this=0x10c3280) at /usr/include/tbb/internal/_concurrent_unordered_impl.h:921
#17 segmentation_in_cache (result=..., sentence=..., this=0x10c3270) at /usr/local/include/segment/segment.h:249
The code is
auto it = dp_cache_.find(sentence);
if (it == dp_cache_.end()) {
return false;
}
result = it->second;
return true;
}
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks, Benyu, for the detailed traceback. It looks like the TBB allocator is failing in an attempt to get a block from the free list. I will talk to the person responsible for the allocator.
Regards,
Chris
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks! Here is the memory configuration.
$ free
total used free shared buffers cached
Mem: 32865628 22334056 10531572 0 433452 12874616
-/+ buffers/cache: 9025988 23839640
Swap: 33443836 0 33443836
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi, Benyu,
Could you try something for me? In the directory with the TBB shared library (tbb_debug.dll or libtbb_debug.so) there is also a library for the allocator (tbbmalloc_debug.dll or libtbbmalloc_debug.so.) Could you rename that file to something else so it is not loaded when you run your test and try it one more time?
Thanks for the help, and regards,
Chris
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Cannot run my test:
error while loading shared libraries: libtbb_debug.so.2: cannot open shared object file: No such file or directory
(I am linking my binary with TBB_DEBUG_Libraries, is it the reason?)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello, Benyu,
It looks like you may have renamed the TBB library instead of (or in addition to) the tbbmalloc library. Only rename the allocator library (The TBB library should then use the regular malloc/free, if I remember right.)
Thanks for the quick reply, and regards,
Chris
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sorry, you are right. Now running.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It's still running. I guess it's the reason as it crashed pretty fast when running the debug libraries.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Unfortunately, crashed:
(gdb) where
#0 0x000000000040a4e5 in insert_dummy (order_key=<optimized out>, it=..., this=0x154e290) at /usr/include/tbb/internal/_concurrent_unordered_impl.h:518#1 tbb::interface5::internal::concurrent_unordered_base<tbb::interface5::concurrent_unordered_map_traits<std::string, std::pair<float, std::vector<int, std::allocator<int> > >, tbb::interface5::internal::hash_compare<std::string, tbb::tbb_hash<std::string>, std::equal_to<std::string> >, tbb::tbb_allocator<std::pair<std::string const, std::pair<float, std::vector<int, std::allocator<int> > > > >, false> >::init_bucket (this=0x154e280, bucket=472) at /usr/include/tbb/internal/_concurrent_unordered_impl.h:1289
#2 0x000000000040b5ba in init_bucket (bucket=2520, this=0x154e280) at /usr/include/tbb/internal/_concurrent_unordered_impl.h:1284
#3 internal_find (key=..., this=0x154e280) at /usr/include/tbb/internal/_concurrent_unordered_impl.h:1181
#4 find (key=..., this=0x154e280) at /usr/include/tbb/internal/_concurrent_unordered_impl.h:921
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello, Benyu,
I'm sorry I haven't got back to you before now. If it took longer to fail, it was probably because of the global lock in the system malloc/free. In any case, it looks like there is something going on in concurrent_unordered, whether or not there is a problem in the TBB malloc.
One more question. You are only adding and fetching items from the table, not erasing, right?
You've given me a snippet of the code you are using to insert items in the table. I will try to write a small program that duplicates the problem.
I'll let you know how it goes. Thanks for your patience.
Regards,
Chris
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Christopher,
I do call clear(). Is it unsafe for multithread?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Chris,
I do call clear(). Is it unsafe?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi, Benyu,
Yes, clear() should not be called in a concurrent context. I am not sure what your application is doing, so I don't have a suggestion how to protect the call.
I've been having a problem with the small test case I wrote. I insert an element to the concurrent_unordered_map, and then try to find it immediately. The test fails, so i am still working on it.
Regards,
Chris
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
clear() is not thread-safe.
(Added after seeing previous posting) Inserting an answer into this forum isn't, either... :-)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Chris,
So it seems there are two issues on-going. I will add a lock to guard clear(), and you will work on the insert/find test case?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi, Benyu,
One additional comment I would make on the design side. The objects in the concurrent_unordered_map are heap-allocated (both the std::string key and the vector<int> component of the value). The use of malloc/free on these structures will involve a global lock that will serialize their creation, as well as resizing of the vectors.
I would also ask if the accesses of the items in the concurrent_unordered_map are all distinct (per thread)? The structure allows the insertion of items without locking, but accessing the same item by multiple threads will also require a per-item lock, especially in the vector case if resizes occur, unless we guarantee algorithmically only one thread at a time accesses each item.
If you know the likely largest size of the vector, you can reserve() that size and get rid of most reallocations.
I haven't tried using a heap-based object as a key before; I don't know if that is what is causing the insertions to fail in my test or (more likely) there is some oversight in my code.
Regards,
Chris
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Chris,
You raised several very good points. I am using the concurrent_unordered_map as a cache of dynamic programming. The sub-problem is the key (string), and the optimal solution for that sub-problem is the value (vector). The sub-problem could be reused within a problem, or across multiple problems, and each thread is working on a different problem. So multiple threads may access the same item at the same time.
And the value (vector) will not resize. So we don't need the reserve?
Re: heap or stack, I feel it's non-trival to host the sub-problems and optimal solutions in local variables. I could use a memory pool as the class member, but it still need lock on it.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page