Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.

get current thread ID doesn't work in TBB?

missing__zlw
Beginner
924 Views
I am using the Fib tutorial example that comes with the TBB download.
I added the following line to check thread ID :

class parallel_whileFibBody
{
QueueStream &my_stream;
parallel_while &my_while;
public:
typedef pair argument_type;
//! fill functor arguments
parallel_whileFibBody(parallel_while &w, QueueStream &s)
: my_while(w), my_stream(s) { }
//! process pair of matrices
void operator() (argument_type mm) const {
mm.first = mm.first * mm.second;
// note: it can run concurrently with QueueStream::pop_if_present()
printf("WhileFibBody %u \\n", (unsigned) pthread_self() );
if(my_stream.Queue.try_pop(mm.second))
my_while.add( mm ); // now, two matrices available. Add next iteration.
else my_stream.Queue.push( mm.first ); // or push back calculated value if queue is empty
}
};

The only change is that printf line. But when I ran this example, the thread ID is always the same. Are there for real mutlple threads generated?

I added the same printf line in the parallel_for body, same result.
0 Kudos
9 Replies
Vladimir_P_1234567890
924 Views
hello,
how many cores on a machine you run the example on?
--Vladimir
0 Kudos
missing__zlw
Beginner
924 Views
This is a 12 core linux machine.
For example, the 12th core has :

processor : 11
vendor_id : GenuineIntel
cpu family : 6
model : 44
model name : Intel Xeon CPU X5675 @ 3.07GHz

0 Kudos
jimdempseyatthecove
Honored Contributor III
924 Views

1) Did you run your test _after_ initializing TBB?

2) Try something the equivilent to

parallel_invoke(

[&]() {

printf("%u\n", (unsigned)pthread_self());

Sleep(100);},

[&]() {

Sleep(10);

printf("%u\n", (unsigned)pthread_self());});

I used Sleep in place of mutex for printf.

The above should show two different ID's assuming TBB is creating multiple threads.
If you see one thread, check the TBB initialization.

Jim Dempsey

0 Kudos
missing__zlw
Beginner
924 Views
Thanks for your response. I added the sleep statement and it shows different thread ID now.
TBB is initialized as this is the tutorial program and the only change I made is to print thread ID.

So why, without sleep, there is always only one thread, even I init with 2 and 4?
0 Kudos
RafSchietekat
Valued Contributor III
924 Views
Are you sure of that, even when you unconditionally set Verbose to true in main() to see boundaries between individual tests and pipe output through "uniq -c" by modifying Makefile? I'm seeing lots of different values, on a 2-core Intel/Linux machine. Is the machine doing anything else at the same time?
0 Kudos
Anton_M_Intel
Employee
924 Views
In TBB, parallelism is optional. That is if the work is small enough the worker thread can come too late when main thread finishes all the tasksalready. Especially, it is actual for the first invocation of TBB parallel algorithm or tasks because worker threads are created lazily on demand. So, even if granularity of your task is usually enough for TBB to work efficiently, the first tasks canstillbe executed sequentially because workers are not ready yet.
A side note to other readers - it is inappropriate to measure latency or scalability of TBB algoritms by the first run in case the work is done in less than a fraction of asecond. Parallelizing such workloads makes sense only if they are executed millions of times, thus the first runs are just not representative.
0 Kudos
RafSchietekat
Valued Contributor III
924 Views
"So, even if granularity of your task is usually enough for TBB to work efficiently, the first tasks canstillbe executed sequentially because workers are not ready yet."
Still, the observed behaviour, if confirmed (see above), differs substantially from what I'm seeing for the same example on another Linux machine, which led me to eliminate startup latency as the most likely explanation...
0 Kudos
jimdempseyatthecove
Honored Contributor III
924 Views
If for example you perform the parallel_invoke with two paths (equivalent to parallel_for splitting iteration space in two), and if the current thread (thread issuing parallel_...) makes it through the internals of TBB an back to and through the first branch of the invoke (or slice of for) then the current thread is as likely to snatch the waiting task (second fork or second slice) as any other thread in the TBB thread pool. (in TBB there is a higher probability for the invoking thread than there is for any task stealing threads). On your dual core notebook, the second core may have been running a thread from a different application. In which case, the latency between the thread wakeup (by your app) and the thread's continuance may exceed the execution latency for the first path of the parallel_invoke (or first slice of parallel_for), thus causing the primary thread to take the second path (or second slice). BTW this is the reason for the Sleep(n)'s in the code snips (and not a suggestion that Sleep(n)'s be included in your programming). The Sleep(n)'s are there for the sole purpose of exposing the crux of the situation.

Jim Dempsey
0 Kudos
RafSchietekat
Valued Contributor III
924 Views
"On your dual core notebook, the second core may have been running a thread from a different application."
But I'm seeing execution on lots of different threads on my otherwise idle dual-core notebook, the opposite of the reported behaviour on the 12-core machine. That's why I didn't consider (unprovoked) latency and offered the suggestion of a heavily loaded system, so it seems we're thinking along the same lines there at least. Perhaps we'll get some more information after #4, otherwise it's of liittle use to speculate further.
0 Kudos
Reply