- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
ping
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Dmitry,
Sorry for the late response. You can enable compiler specific builtins with TBB_USE_GCC_BUILTINS or TBB_USE_ICC_BUILTINS define macros. Try this and tell what you got, because something that you need may be missed.
Thanks,
Nikita
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Great! Thanks, will try.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Number of races greatly reduced with TBB_USE_GCC_BUILTINS, however we still see few like:
WARNING: ThreadSanitizer: data race (pid=608931)
Read of size 8 at 0x7b3000009430 by thread T11:
#0 load_with_acquire tbb_machine.h:613:23
...
Previous write of size 8 at 0x7b3000009430 by thread T10:
#0 store_with_release tbb_machine.h:619:18
It seems we are hitting this case in machine/gcc_generic.h:
// TODO: implement with __atomic_* builtins where available
#define __TBB_USE_GENERIC_HALF_FENCED_LOAD_STORE 1
#define __TBB_USE_GENERIC_RELAXED_LOAD_STORE 1
#define __TBB_USE_GENERIC_SEQUENTIAL_CONSISTENCY_LOAD_STORE 1
We are also using clang, and it seems the header file only checks gcc version when selecting between old and new atomics. My clang 5.0 pretends that it is a very old gcc, so it probably gets the old builtins which can't express relaxed stores, etc. I think we also need something along these lines:
#ifndef __has_builtin
# define __has_builtin(x) 0
#endif#if __TBB_GCC_VERSION < XXXXX || __has_builtin(__atomic_load_n)
... use new builtins ...
#endif
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
To make it clear, we only care about the case when compiler support the new atomics and the new atomics can directly express relaxed/acquire/release store/loads. Compilers that don't have new atomics, don't support ThreadSanitizer either (and are just too old to bother).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks Dmitry, we will take a look. And if you know some commonly used macro that can be checked for presence of new atomic built-ins, I will appreciate if you share with us. The __has_builtin() is an option, but maybe there is some feature macro available.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I don't know what's the right __has_feature value for this. After searching internet I see that some code just checks __has_builtin for all atomic function, e.g.:
https://code.woboq.org/llvm/libcxx/src/include/atomic_support.h.html
and some code checks __has_feature(c_atomic), e.g.:
https://gist.github.com/galek/68480cb04b432443cbad
However, I think strictly saying __has_feature(c_atomic) tells about presence of __c11_atomic_* functions, which is a parallel set of functions with the different prefix.
I would say __has_builtin(__atomic_load_n) should work well in practice.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Is there any progress on this? We really want to use TBB for our application, but we can't because of the errors thrown by the ThreadSanitizer.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
up
still relevant
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Dmitry,
I have recently evaluated if TBB can be analyzed with TSan. Indeed there is a lot of diagnostics, and some should be addressed on our side. But, as a part of it, I tried to apply TSan to a C++ implementation of Peterson's lock algorithm, and to my surprise TSan did not pass this test and reported a race, i.e. a false positive diagnostic.
I started with your own implementation I took from https://www.justsoftwaresolutions.co.uk/threading/petersons_lock_with_C++0x_atomics.html (as far as I understand, Anthony took it from some other place and adjusted to use C++11 atomics); then I modified it as in the attached file (without losing correctness, I believe) to be a closer match (operation- and fence-wise) to the THE work stealing implementation in TBB. For both implementations, TSan reported races for the variable protected by the lock. I used GCC 7.3 and Clang 6 for testing; both gave the same result.
So, unfortunately, for now it is not possible to analyze TBB programs with ThreadSanitizer.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes. This is true.
This code should work under tsan if we replace:
while (flag1.load(std::memory_order_relaxed)
std::atomic_thread_fence(std::memory_order_acquire);
with:
while (flag1.load(std::memory_order_acquire)
This should have the same performance on x86 (faster on Itanium?), but can slow down ARM for unsuccessful steal operation.
The question is: how many such places are in TBB code? If we are talking about few, then we could do some #ifdef's. E.g. a weird:
#ifdef THREAD_SANITIZER
flag1.load(std::memory_order_acquire);
#else
std::atomic_thread_fence(std::memory_order_acquire);
#endif
should work too.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
But if you are up to not relying on standalone acquire/release fences then, of course, we can both make it work with tsan and not sprinkle the code with ifdef's.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
`while (flag1.load(std::memory_order_acquire)` was in your original code, and as far as I recall it still has false positives.
The work stealing protocol is the main place where we try to keep atomic operations as lightweight as possible. There are also a few cases where we use epoch-based lazy synchronization - but overall, not so many perhaps.
Using ifdef's is in principle possible, though improving the tool to better handle standalone fences has its own value I believe. If not that, I think we would need to better understand what exactly cannot be handled well by the tool, in order to recognize these patterns in the code and find out how to change it.
Also, I think the model where one need to recompile TBB in order to analyze their own code for thread safety is, let's say, imperfect. First, it's extra burden on application developers; second, and perhaps more important, is that they do not test what they ship. I believe most of our customers who want to use ThreadSanitizer do not want to analyze TBB but interested in their own code. From that perspective, having some way to teach the tool about semantics of TBB operations - something like a config file specified via command line - could be useful. It would be pretty much the same as handling e.g. standard memory allocation routines, which "synchronize-with" each other in a certain way. Perhaps if the tool has some way to learn and take into account such relations for 3rd libraries like TBB, we could provide some configuration file (and also formally document these relations along the way).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
> Using ifdef's is in principle possible, though improving the tool to better handle standalone fences has its own value I believe.
Not handling standalone acquire/release fences is somewhat intentional. The problem is that they have global effect on thread synchronization. Namely, a release fence effectively turns all subsequent relaxed stores into release stores, and synchronization operations are costly to model as compared to relaxed atomic operations. An acquire fence is even worse because it turns all preceding relaxed loads into acquire loads, which means that when we handle all relaxed loads as not-yet-materialized acquire loads just in case the thread will ever execute an acquire fence later.
Even if we implement handling of acquire/release fences in tsan as an optional feature, chances are that we won't be able to enable this mode on our projects because they are large in all respects (to put it mildly).
Combining memory ordering constraints with memory access also leads to more readable code. Which is usually a better trade-off for everything except the most performance-critical code (e.g. TBB).
But, yes, it's incomplete modelling of C/C++ memory model. We just happened to get away that far (checking 100+MLOC) without stand-alone fences.
> If not that, I think we would need to better understand what exactly cannot be handled well by the tool, in order to recognize these patterns in the code and find out how to change it.
As far as I can tell that's only stand-alone acquire/release fences. seq_cst fences should work as long as they are not used also as acquire/release fences, e.g. in the test.cpp above we use seq_cst fence but also annotate memory accesses with acquire/release as necessary.
Of course, it's possible that TBB will uncover some other interesting corner cases. But we do verify things like epoch-based GCs under tsan and it all works fine.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
> Also, I think the model where one need to recompile TBB in order to analyze their own code for thread safety is, let's say, imperfect.
FWIW this will work for us as we tend to compile everything from source (which has other advantages as compared to fixed, non-transparent binary blobs).
But we have options for annotations too. Tsan has a set of annotations for "user code" for acquire/release (synchronizes-with) and mutexes. E.g. ABSL mutex uses these:
https://github.com/abseil/abseil-cpp/blob/master/absl/synchronization/mutex.cc#L1430-L1444
These annotations tell tsan about user semantics of the mutex but otherwise make it ignore everything that happens inside of mutex impl (both synchronization and memory accesses). This has advantages for both checking precision, performance and reports quality. But mutex impl itself is obviously left unchecked.
Tsan also has interceptors for common library routines. E.g. pthreads:
C++ ABI __cxa_guard_acquire:
Some ObjectiveC builtin synchronization primitives:
And actually the whole of libdispatch/GCD, which is quite similar to TBB in nature:
But interceptors work well for well-defined, stable C APIs. I am not sure it will be a good fit TBB C++ APIs. we do can intercept mangled C++ names, but I guess you may still not want to commit to exact names and signatures.
IIRC TBB has some substantial parts of code in header files. This means that even if we don't instrument cpp files, the header parts will end up instrumented as part of user code. This can potentially lead to false positives depending on what route we take.
Also annotations enabled at runtime with some kind of config will have runtime overhead even when not enabled. This is suboptimal.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
What do you think about the following?
All TBB entry functions are placed in header files and annotated with macros:
// in header file void queue::push(void* obj) { TBB_TSAN_ENTER(); ... TBB_TSAN_RELEASE(obj); push_impl(obj); // in cpp file ... TBB_TSAN_EXIT(); }
TBB_TSAN_ENTER/EXIT will disable tsan checking for both synchronization operations and memory accesses within the dynamic scope. And TBB_TSAN_RELEASE/ACQUIRE will communicate the user-visible semantics.
Since this all is in headers, we can enable/disable annotations at compile time depending on if the user code is compiler with tsan or not:
#ifdef THREAD_SANITIZER # define TBB_TSAN_ACQUIRE(p) __tsan_acquire(p) # define TBB_TSAN_RELEASE(p) __tsan_release(p) #else # define TBB_TSAN_ACQUIRE(p) # define TBB_TSAN_RELEASE(p) #endif
But there will be a bunch of interesting questions. E.g. what to use as synchronization address if queue element is not a pointer. Or how to wrap user tasks into annotated tasks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Just so that this option is not lost from consideration, making tsan understand TBB atomic operations would solve our problem at hand too.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi all,
Has there been any progress since the previous messages to integrate std::atomics into TBB or to make it compatible with thread sanitizer? Would help out our work immensely too.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page