Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.

concurrrent_unordered_map crash when insert() or find()

Benyu_Z_
Beginner
4,770 Views

I have a multithread program which uses concurrent_unordered_map (latest tbb version, built from src). It will crashes after several hours running. The core dump is

(gdb) where
#0 flist_iterator (other=..., this=<synthetic pointer>) at /usr/include/tbb/internal/_concurrent_unordered_impl.h:90
#1 get_bucket (bucket=<optimized out>, this=<optimized out>) at /usr/include/tbb/internal/_concurrent_unordered_impl.h:1333
#2 internal_insert (value=..., this=0x7f75a404e500) at /usr/include/tbb/internal/_concurrent_unordered_impl.h:1127
#3 insert (value=..., this=0x7f75a404e500) at /usr/include/tbb/internal/_concurrent_unordered_impl.h:860
#4 ngram_segment<20>::segment (this=this@entry=0x7f75a404e4f0, sentence=..., result=..., debug=debug@entry=false) at /usr/local/include/segment/segment.h:319

The code around 319 is:

{

auto it = dp_cache_.find(sentence);

if (it == dp_cache_.end()) {

  dp_cache_.insert(make_pair(sentence, result));

}

}

And the types:

typedef pair<float, vector<int>> segmentation;

concurrent_unordered_map<string, segmentation> dp_cache_;

I tried concurrent_hash_map and different versions of tbb or pre-built binaries. They all crashes. 

Anyone seeing similar cases? 

Helps are appreciated!

0 Kudos
29 Replies
Benyu_Z_
Beginner
3,264 Views

I just tried debug version of tbb libraries, and now it shows more information, while I still don't understand why.

(gdb) where
#0 0x00007f310ee37425 in __GI_raise (sig=<optimized out>) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1 0x00007f310ee3ab8b in __GI_abort () at abort.c:91
#2 0x00007f310a763234 in rml::internal::assertion_failure (filename=0x7f310a76f948 "../../src/tbbmalloc/frontend.cpp", line=1805, expression=0x7f310a76fc68 "allocatedCount < (slabSize-sizeof(Block))/objectSize", comment=0x0)
at ../../src/tbbmalloc/../tbb/tbb_assert_impl.h:84
#3 0x00007f310a76c9da in rml::internal::Block::allocateFromFreeList (this=0x7f310a4a0000) at ../../src/tbbmalloc/frontend.cpp:1805
#4 0x00007f310a76cb09 in rml::internal::Block::allocate (this=0x7f310a4a0000) at ../../src/tbbmalloc/frontend.cpp:1832
#5 0x00007f310a76a76a in rml::internal::internalPoolMalloc (memPool=0x7f310a978ea0 <rml::internal::defaultMemPool_space>, size=56) at ../../src/tbbmalloc/frontend.cpp:2239
#6 0x00007f310a76aae5 in rml::internal::internalMalloc (size=56) at ../../src/tbbmalloc/frontend.cpp:2324
#7 0x00007f310a76b076 in scalable_malloc (size=56) at ../../src/tbbmalloc/frontend.cpp:2566
#8 0x00007f310f9f1adb in tbb::internal::allocate_via_handler_v3 (n=56) at ../../src/tbb/cache_aligned_allocator.cpp:237
#9 0x000000000040a671 in allocate (this=<optimized out>, n=<optimized out>) at /usr/include/tbb/internal/../tbb_allocator.h:109
#10 create_node (order_key=<optimized out>, this=<optimized out>) at /usr/include/tbb/internal/_concurrent_unordered_impl.h:276
#11 insert_dummy (order_key=<optimized out>, it=..., this=0x10c3290) at /usr/include/tbb/internal/_concurrent_unordered_impl.h:510
#12 init_bucket (bucket=1812, this=0x10c3280) at /usr/include/tbb/internal/_concurrent_unordered_impl.h:1289
#13 tbb::interface5::internal::concurrent_unordered_base<tbb::interface5::concurrent_unordered_map_traits<std::string, std::pair<float, std::vector<int, std::allocator<int> > >, tbb::interface5::internal::hash_compare<std::string, tbb::tbb_hash<std::string>, std::equal_to<std::string> >, tbb::tbb_allocator<std::pair<std::string const, std::pair<float, std::vector<int, std::allocator<int> > > > >, false> >::init_bucket (this=0x10c3280, bucket=3860) at /usr/include/tbb/internal/_concurrent_unordered_impl.h:1284
#14 0x000000000040b5ba in init_bucket (bucket=7956, this=0x10c3280) at /usr/include/tbb/internal/_concurrent_unordered_impl.h:1284
#15 internal_find (key=..., this=0x10c3280) at /usr/include/tbb/internal/_concurrent_unordered_impl.h:1181
#16 find (key=..., this=0x10c3280) at /usr/include/tbb/internal/_concurrent_unordered_impl.h:921

0 Kudos
Christophe_H_Intel
3,264 Views

Hello, Benyu,

Thanks for the report.  Can you tell me what kind of failure it is?  The traceback is at the copy-construction of an flist_iterator as a return value, which is constructed on the stack, so it doesn't seem to be a heap failure.  And the initialization of an element when inserting into the map should get a segfault if there is no memory available, but it should be in the split_ordered_list.

Do you have an idea of how many items are inserted into the map before the failure?

Regards,
Chris Huson

0 Kudos
Benyu_Z_
Beginner
3,264 Views

Hi Christopher,

I don't have the exact number at hand, but I believe it's less than 10K. 

0 Kudos
Benyu_Z_
Beginner
3,264 Views

Hi Christopher,

BTW, I tried to use the debug version of tbb libraries. Here is the core dump of another crash (this time is in find()) Wish it provide more information.

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `build/segment_service -w segment_service/ -P 8003'.
Program terminated with signal 6, Aborted.
#0 0x00007f310ee37425 in __GI_raise (sig=<optimized out>) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
64 ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) where
#0 0x00007f310ee37425 in __GI_raise (sig=<optimized out>) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1 0x00007f310ee3ab8b in __GI_abort () at abort.c:91
#2 0x00007f310a763234 in rml::internal::assertion_failure (filename=0x7f310a76f948 "../../src/tbbmalloc/frontend.cpp", line=1805, expression=0x7f310a76fc68 "allocatedCount < (slabSize-sizeof(Block))/objectSize", comment=0x0) at ../../src/tbbmalloc/../tbb/tbb_assert_impl.h:84
#3 0x00007f310a76c9da in rml::internal::Block::allocateFromFreeList (this=0x7f310a4a0000) at ../../src/tbbmalloc/frontend.cpp:1805
#4 0x00007f310a76cb09 in rml::internal::Block::allocate (this=0x7f310a4a0000) at ../../src/tbbmalloc/frontend.cpp:1832
#5 0x00007f310a76a76a in rml::internal::internalPoolMalloc (memPool=0x7f310a978ea0 <rml::internal::defaultMemPool_space>, size=56) at ../../src/tbbmalloc/frontend.cpp:2239
#6 0x00007f310a76aae5 in rml::internal::internalMalloc (size=56) at ../../src/tbbmalloc/frontend.cpp:2324
#7 0x00007f310a76b076 in scalable_malloc (size=56) at ../../src/tbbmalloc/frontend.cpp:2566
#8 0x00007f310f9f1adb in tbb::internal::allocate_via_handler_v3 (n=56) at ../../src/tbb/cache_aligned_allocator.cpp:237
#9 0x000000000040a671 in allocate (this=<optimized out>, n=<optimized out>) at /usr/include/tbb/internal/../tbb_allocator.h:109
#10 create_node (order_key=<optimized out>, this=<optimized out>) at /usr/include/tbb/internal/_concurrent_unordered_impl.h:276
#11 insert_dummy (order_key=<optimized out>, it=..., this=0x10c3290) at /usr/include/tbb/internal/_concurrent_unordered_impl.h:510
#12 init_bucket (bucket=1812, this=0x10c3280) at /usr/include/tbb/internal/_concurrent_unordered_impl.h:1289
#13 tbb::interface5::internal::concurrent_unordered_base<tbb::interface5::concurrent_unordered_map_traits<std::string, std::pair<float, std::vector<int, std::allocator<int> > >, tbb::interface5::internal::hash_compare<std::string, tbb::tbb_hash<std::string>, std::equal_to<std::string> >, tbb::tbb_allocator<std::pair<std::string const, std::pair<float, std::vector<int, std::allocator<int> > > > >, false> >::init_bucket (this=0x10c3280, bucket=3860) at /usr/include/tbb/internal/_concurrent_unordered_impl.h:1284
#14 0x000000000040b5ba in init_bucket (bucket=7956, this=0x10c3280) at /usr/include/tbb/internal/_concurrent_unordered_impl.h:1284
#15 internal_find (key=..., this=0x10c3280) at /usr/include/tbb/internal/_concurrent_unordered_impl.h:1181
#16 find (key=..., this=0x10c3280) at /usr/include/tbb/internal/_concurrent_unordered_impl.h:921
#17 segmentation_in_cache (result=..., sentence=..., this=0x10c3270) at /usr/local/include/segment/segment.h:249

The code is

     auto it = dp_cache_.find(sentence);

     if (it == dp_cache_.end()) {

        return false;

      }

      result = it->second;

      return true;

}

0 Kudos
Christophe_H_Intel
3,264 Views

Thanks, Benyu, for the detailed traceback.  It looks like the TBB allocator is failing in an attempt to get a block from the free list.  I will talk to the person responsible for the allocator.

Regards,
Chris

0 Kudos
Benyu_Z_
Beginner
3,264 Views

Thanks! Here is the memory configuration.

$ free
total used free shared buffers cached
Mem: 32865628 22334056 10531572 0 433452 12874616
-/+ buffers/cache: 9025988 23839640
Swap: 33443836 0 33443836

0 Kudos
Christophe_H_Intel
3,264 Views

Hi, Benyu,

Could you try something for me?  In the directory with the TBB shared library (tbb_debug.dll or libtbb_debug.so) there is also a library for the allocator (tbbmalloc_debug.dll or libtbbmalloc_debug.so.)  Could you rename that file to something else so it is not loaded when you run your test and try it one more time?

Thanks for the help, and regards,
Chris

0 Kudos
Benyu_Z_
Beginner
3,264 Views

Cannot run my test:

error while loading shared libraries: libtbb_debug.so.2: cannot open shared object file: No such file or directory

(I am linking my binary with TBB_DEBUG_Libraries, is it the reason?)

0 Kudos
Christophe_H_Intel
3,264 Views

Hello, Benyu,

It looks like you may have renamed the TBB library instead of (or in addition to) the tbbmalloc library.  Only rename the allocator library (The TBB library should then use the regular malloc/free, if I remember right.)

Thanks for the quick reply, and regards,
Chris

0 Kudos
Benyu_Z_
Beginner
3,264 Views

Sorry, you are right. Now running.

0 Kudos
Benyu_Z_
Beginner
3,264 Views

It's still running. I guess it's the reason as it crashed pretty fast when running the debug libraries. 

0 Kudos
Benyu_Z_
Beginner
3,264 Views

Unfortunately, crashed:

(gdb) where
#0 0x000000000040a4e5 in insert_dummy (order_key=<optimized out>, it=..., this=0x154e290) at /usr/include/tbb/internal/_concurrent_unordered_impl.h:518#1 tbb::interface5::internal::concurrent_unordered_base<tbb::interface5::concurrent_unordered_map_traits<std::string, std::pair<float, std::vector<int, std::allocator<int> > >, tbb::interface5::internal::hash_compare<std::string, tbb::tbb_hash<std::string>, std::equal_to<std::string> >, tbb::tbb_allocator<std::pair<std::string const, std::pair<float, std::vector<int, std::allocator<int> > > > >, false> >::init_bucket (this=0x154e280, bucket=472) at /usr/include/tbb/internal/_concurrent_unordered_impl.h:1289
#2 0x000000000040b5ba in init_bucket (bucket=2520, this=0x154e280) at /usr/include/tbb/internal/_concurrent_unordered_impl.h:1284
#3 internal_find (key=..., this=0x154e280) at /usr/include/tbb/internal/_concurrent_unordered_impl.h:1181
#4 find (key=..., this=0x154e280) at /usr/include/tbb/internal/_concurrent_unordered_impl.h:921

0 Kudos
Christophe_H_Intel
3,264 Views

Hello, Benyu,

I'm sorry I haven't got back to you before now.  If it took longer to fail, it was probably because of the global lock in the system malloc/free.  In any case, it looks like there is something going on in concurrent_unordered, whether or not there is a problem in the TBB malloc.

One more question.  You are only adding and fetching items from the table, not erasing, right? 

You've given me a snippet of the code you are using to insert items in the table.  I will try to write a small program that duplicates the problem.

I'll let you know how it goes.  Thanks for your patience.

Regards,
Chris

0 Kudos
Benyu_Z_
Beginner
3,264 Views

Hi Christopher,

I do call clear(). Is it unsafe for multithread?

0 Kudos
Benyu_Z_
Beginner
3,264 Views

Hello Chris,

I do call clear(). Is it unsafe?

0 Kudos
Christophe_H_Intel
3,264 Views

Hi, Benyu,

Yes, clear() should not be called in a concurrent context.  I am not sure what your application is doing, so I don't have a suggestion how to protect the call.

I've been having a problem with the small test case I wrote.  I insert an element to the concurrent_unordered_map, and then try to find it immediately.  The test fails, so i am still working on it.

Regards,
Chris

0 Kudos
RafSchietekat
Valued Contributor III
3,264 Views

clear() is not thread-safe.

(Added after seeing previous posting) Inserting an answer into this forum isn't, either... :-)

0 Kudos
Benyu_Z_
Beginner
3,264 Views

Hi Chris,

So it seems there are two issues on-going. I will add a lock to guard clear(), and you will work on the insert/find test case?

0 Kudos
Christophe_H_Intel
3,264 Views

Hi, Benyu,

One additional comment I would make on the design side.  The objects in the concurrent_unordered_map are heap-allocated (both the std::string key and the vector<int> component of the value).  The use of malloc/free on these structures will involve a global lock that will serialize their creation, as well as resizing of the vectors.

I would also ask if the accesses of the items in the concurrent_unordered_map are all distinct (per thread)?  The structure allows the insertion of items without locking, but accessing the same item by multiple threads will also require a per-item lock, especially in the vector case if resizes occur, unless we guarantee algorithmically only one thread at a time accesses each item.

If you know the likely largest size of the vector, you can reserve() that size and get rid of most reallocations.

I haven't tried using a heap-based object as a key before; I don't know if that is what is causing the insertions to fail in my test or (more likely) there is some oversight in my code.

Regards,
Chris

0 Kudos
Benyu_Z_
Beginner
3,110 Views

Hi Chris,

You raised several very good points. I am using the concurrent_unordered_map as a cache of dynamic programming. The sub-problem is the key (string), and the optimal solution for that sub-problem is the value (vector). The sub-problem could be reused within a problem, or across multiple problems, and each thread is working on a different problem. So multiple threads may access the same item at the same time.

And the value (vector) will not resize. So we don't need the reserve? 

Re: heap or stack, I feel it's non-trival to host the sub-problems and optimal solutions in local variables. I could use a memory pool as the class member, but it still need lock on it.

0 Kudos
Reply