Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.
2466 Discussions

TBB : Valgrind find threads errors using DRD tool or Helgrind tool

FlorentD
Beginner
2,300 Views

Hi,

I use Valgrind with DRD or Helgrind tool in order to find data races and threads errors in my program.

 

However, I use tbb::concurrent_set and Valgrind reports a tons of errors about this container, libtbbmalloc. You can see the file in attachement with all errors detected by Valgrind.

 

Example :

==12974== Thread 1:
==12974== Conflicting load by thread 1 at 0x054ffb00 size 8
==12974==    at 0x48AD286: tbb::detail::r1::allocate(tbb::detail::d1::small_object_pool*&, unsigned long, tbb::detail::d1::execution_data const&) (in /usr/lib/x86_64-linux-gnu/libtbb.so.12.5)
==12974==    by 0x18F2C8: new_object<tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<long unsigned int>, testTBBConcurrentSet()::<lambda(const tbb::detail::d1::blocked_range<long unsigned int>&)>, const tbb::detail::d1::auto_partitioner>, tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<long unsigned int>, testTBBConcurrentSet()::<lambda(const tbb::detail::d1::blocked_range<long unsigned int>&)>, const tbb::detail::d1::auto_partitioner>&, tbb::detail::d0::split&, tbb::detail::d1::small_object_allocator&> (_small_object_pool.h:53)
==12974==    by 0x18F2C8: offer_work_impl<tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<long unsigned int>, testTBBConcurrentSet()::<lambda(const tbb::detail::d1::blocked_range<long unsigned int>&)>, const tbb::detail::d1::auto_partitioner>&, tbb::detail::d0::split&> (parallel_for.h:137)
==12974==    by 0x18F2C8: offer_work (parallel_for.h:124)
==12974==    by 0x18F2C8: execute<tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<long unsigned int>, testTBBConcurrentSet()::<lambda(const tbb::detail::d1::blocked_range<long unsigned int>&)>, const tbb::detail::d1::auto_partitioner>, tbb::detail::d1::blocked_range<long unsigned int> > (partitioner.h:284)
==12974==    by 0x18F2C8: tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<unsigned long>, testTBBConcurrentSet()::{lambda(tbb::detail::d1::blocked_range<unsigned long> const&)#1}, tbb::detail::d1::auto_partitioner const>::execute(tbb::detail::d1::execution_data&) (parallel_for.h:172)
==12974==    by 0x48B66C0: ??? (in /usr/lib/x86_64-linux-gnu/libtbb.so.12.5)
==12974==    by 0x18E212: execute_and_wait (_task.h:191)
==12974==    by 0x18E212: run (parallel_for.h:114)
==12974==    by 0x18E212: run (parallel_for.h:103)
==12974==    by 0x18E212: parallel_for<tbb::detail::d1::blocked_range<long unsigned int>, testTBBConcurrentSet()::<lambda(const tbb::detail::d1::blocked_range<long unsigned int>&)> > (parallel_for.h:231)
==12974==    by 0x18E212: testTBBConcurrentSet() (TestThreads.cpp:48)
==12974==    by 0x11A10C: main (main.cpp:58)

 

And the source code :

#include <tbb/tbb.h>
#include <iostream>

int testTBBConcurrentSet()
{
    tbb::concurrent_set<int> data;
    tbb::parallel_for(tbb::blocked_range<size_t>(0, 100000),
        [&](const tbb::blocked_range<size_t>& r) {
            for (unsigned int i=r.begin(); i < r.end(); i++) {
                data.insert(i);
            }
        }
    );
    std::cout << "Size " << data.size() << std::endl;
    return 0;
}

int main()
{
    testTBBConcurrentSet(); 
    return 0;
}

How to fix that ? Or how to disabled TBB errors using Valgrind ?

 

Thanks

0 Kudos
12 Replies
SeshaP_Intel
Moderator
2,258 Views

Hi,


Thank you for posting in Intel Communities.

TBB uses its own memory allocator libtbbmalloc, it caches the memory till the process termination and can appear as a leak.

It is likely that after main() termination, the worker threads are still running as TBB threads run and terminate asynchronously. It leads to the same impression for the Valgrind with the DRD tool.


In order to suppress TBB memory leaks, you can try the below method.

1. Try to remove libtbbmalloc.so.2 or tbbmalloc.dll file so running an application with environment variable TBB_VERSION=1 will output TBB: ALLOCATOR malloc but not TBB: ALLOCATOR scalable_malloc

2. Make sure all the TBB threads are terminated.


Thanks and Regards,

Pendyala Sesha Srinivas


0 Kudos
FlorentD
Beginner
2,237 Views

Hi,

 

How can I check if all TBB threads are terminated ? However, Valgrind DRD errors are displayed during execution and not after the program exists. So I don't think it will solve something.

 

Note that if I use a TBB container and concurrent functions (push_back from concurrent_vector by instance). I also have plenty of errors with DRD or Helgrind tools. It makes almost impossible to detect real data races and thread errors in my application.

 

Thanks for the tip for memory check.

0 Kudos
SeshaP_Intel
Moderator
2,191 Views

Hi,


We have reported this issue to the concerned development team. They are looking into your issue.


Thanks and Regards,

Pendyala Sesha Srinivas


0 Kudos
FlorentD
Beginner
2,151 Views

Hi,

 

Any news about this issue ?

 

Thanks,

0 Kudos
Pavel_K_Intel1
Employee
2,088 Views

Hi,
I can reproduce similar issue that reports 

valgrind --tool=helgrind

 But according to helgrind manual:

  • Make sure your application, and all the libraries it uses, use the POSIX threading primitives. Helgrind needs to be able to see all events pertaining to thread creation, exit, locking and other synchronisation events. To do so it intercepts many POSIX pthreads functions.
  • Do not roll your own threading primitives (mutexes, etc) from combinations of the Linux futex syscall, atomic counters, etc. These throw Helgrind's internal what's-going-on models way off course and will give bogus results

I also checked small std::atomic example with helgrind:

#include <atomic>
#include <thread>
#include <iostream>

int main() {
    std::atomic<int> counter{0};

    auto thread_func = [&counter] {
        for (int i = 0; i < 100000; ++i) {
            int value = counter;
            counter.store(value + 10);
        }
    };

    std::thread thr(thread_func);

    thread_func();

    thr.join();
    std::cout << counter << std::endl;
}

And there is a output:

==12109== ----------------------------------------------------------------
==12109==
==12109== Possible data race during read of size 4 at 0x1FFEFFFBC4 by thread #1
==12109== Locks held: none
==12109==    at 0x109A61: load (atomic_base.h:419)
==12109==    by 0x109A61: std::__atomic_base<int>::operator int() const (atomic_base.h:282)
==12109==    by 0x1092D4: main::{lambda()#1}::operator()() const (reproducer.cpp:10)
==12109==    by 0x109367: main (reproducer.cpp:17)
==12109==
==12109== This conflicts with a previous write of size 4 by thread #2
==12109== Locks held: none
==12109==    at 0x10930F: main::{lambda()#1}::operator()() const (atomic_base.h:397)
==12109==    by 0x109922: void std::__invoke_impl<void, main::{lambda()#1}>(std::__invoke_other, main::{lambda()#1}&&) (invoke.h:60)
==12109==    by 0x1098C3: std::__invoke_result<main::{lambda()#1}>::type std::__invoke<main::{lambda()#1}>(std::__invoke_result&&, (main::{lambda()#1}&&)...) (invoke.h:95)
==12109==    by 0x109861: void std::thread::_Invoker<std::tuple<main::{lambda()#1}> >::_M_invoke<0ul>(std::_Index_tuple<0ul>) (thread:244)
==12109==    by 0x109822: std::thread::_Invoker<std::tuple<main::{lambda()#1}> >::operator()() (thread:251)
==12109==    by 0x1097F7: std::thread::_State_impl<std::thread::_Invoker<std::tuple<main::{lambda()#1}> > >::_M_run() (thread:195)
==12109==    by 0x4952DE3: ??? (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.28)
==12109==    by 0x4842B1A: ??? (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_helgrind-amd64-linux.so)
==12109==  Address 0x1ffefffbc4 is on thread #1's stack
==12109==  in frame #2, created by main (reproducer.cpp:5)
==12109==
==12109== ----------------------------------------------------------------

But this is a false positive, there are no data races.

 


Since oneTBB uses not only atomics (there are self written mutexes, concurrent_monitor etc.) there will be a lot of false positive. 
This doesn't mean that oneTBB is not tested against data races at all, we use Thread Sanitizer (low level library) that respects atomics, self written mutexes, etc. and we have 100% tests pass rate with this tool now.

0 Kudos
SeshaP_Intel
Moderator
2,038 Views

Hi,


We haven't heard back from you. Could you please provide an update on your issue?


Thanks and Regards,

Pendyala Sesha Srinivas


0 Kudos
FlorentD
Beginner
2,029 Views

Hi,

 

As I notice, there is no solution to use Valgrind with TBB avoiding false positives...

The only solution is to removed all calls to the TBB library when running under Valgrind but it is a shame !

 

I suggest for future releases to solve this issue

 

Thanks

0 Kudos
Pavel_K_Intel1
Employee
2,025 Views

Hi @FlorentD, as I mentioned before originally it is not a oneTBB issue it is a way how Helgrind works. Initially application that want to use Helgrind extension should use only POSIX API's for synchronization.
oneTBB builds it's synchronization primitives based on standard utilities from C++11, but problem that Helgrind does not cover part of  C++ synchronization primitives.

0 Kudos
FlorentD
Beginner
2,023 Views

Hi,

 

It is strange because if I remplace all TBB functions by primitives from C++11, Helgrind works fine.

So if TBB uses C++11 primitives, it should work ?

 

I don't understand

0 Kudos
Pavel_K_Intel1
Employee
2,021 Views

@FlorentD, in my first answer I have attached example with C++11 atomics and unfortunately Helgrind reports false positive. In oneTBB tons of synchronizations build upon std::atomics. 

0 Kudos
SeshaP_Intel
Moderator
1,967 Views

Hi,


We haven't heard back from you. Could you please provide an update on your issue?


Thanks and Regards,

Pendyala Sesha Srinivas


0 Kudos
SeshaP_Intel
Moderator
1,896 Views

Hi,


We assume that your issue is resolved. If you need any additional information, please post a new question as this thread will no longer be monitored by Intel.


Thanks and Regards,

Pendyala Sesha Srinivas


0 Kudos
Reply