1. No. omp_get_thread_num() is an OpenMP call, may not even be present if the module has not been compiled with the appropriate OpenMP switch
2. Mmmmmm, no. There is a method to this. One of the goals of TBB is minimize as much as possible the programmers need to be concerned about thread scheduling: because in the landscape moving forward, you can't predict how many threads will be available. Letting thread scheduling fall to some automatic process that can adapt makes more sense. That said, you can always hack to get the underlying thread ID (OS dependent), and there has been some talk of providing an initialization function for pool threads for a variety of purposes. You could use this to generate thread local storage and/or assigning an ID.
Are you interested in understanding better the scheduling of threads for your own education, or are you trying to institutionalize a thread ID for some other purpose?
A bit of background: Early drafts of the TBB specification had a way to get a thread ID, but our OpenMP group vehementlyargued to remove it, based on experience with OpenMP programs. So we removed it. The root problem is that omp_get_thread_num is intrinsically tied to the notion of a thread team, an inheritance from flat SPMD (Singe Program Multiple Data) programming.
TBB targets non-flat nested and recursive parallelism, and very dynamic parallelism where the "number of threads" might change while a program runs.
So I'd be interested too in hearing your use case for omp_get_thread_num. We tried in TBB to provide the power to avoid it (e.g., sometimes where it is used in OpenMP can be done more elegantly with parallel_reduce over a user-defined type). But I realize we did not address all use cases, and so your cases would be an important contribution t what gaps we need to cover.
In my application, I have a large sparse (jacobian) matrix (of double
The pattern of the updates is determined by the structure of the particular
test case, but has no regularity that I can take advantage of.
The actual update step is very simple:
*p += value
where p points to a particular entry in the matrix. (I wish I could
use atomic operations.)
A) one lock per row
B) one lock per element
C) each thread has a copy of the matrix
In case C), I wouldn't need any locks. But, it would add overhead to
zero out the extra copies at the beginning of the update, and add
the copies together at the end.
I haven't done enough experiments yet to measure the costs of
these three techniques.
To even consider case C), I would have to have the minimum possible
number of thread local copies of the matrix. It would be prohibitively
expensive to copy/join the whole matrix in the constructor/
destructor of the parallel object.
To summarize, I have a large data structure that I'm accumulating
values into, in an irregular pattern. If I could keep only the minimum
possible number of copies of this structure, it could be the
most efficient implementation.
I guess it isn't so much that I need a thread id, but I need
thread local storage that is guaranteed private to each runnable thread,
where I can manage it's creation and joining.
I understand TBB can't be everything to everyone, but I put
this up as an example that needs a thread id.