- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello, I'm a newbie experimenting with TBB for the first time. I want to set up an array in TLS for each thread in the TBB task scheduler so that I can later execute parallel_for across an array of objects that write events to the pre-allocated TLS. I've copied the approach to setting up the TLS from a demo application released by Intel some time ago (Intel Smoke demo), but I must be missing something because it works fine in their demo but not for me.
The issue, as far as I can tell, is that when I run the synchronization task it seems that TBB is executing some of the tasks in the same thread, so the TLS does not get initialized for some threads that exist in the scheduler. Later when I call the parallel_for it uses all of the threads and from some threads tries to access the TLS that didn't get initialized. I'm including my code and example output from the compiled program below.
class SynchronizeTask : public tbb::task { public: SynchronizeTask() {} tbb::task *execute() { m_fCallback(m_pCallbackParam); if (InterlockedDecrement(&m_lCallbacksCount) == 0) { // set all of the SynchronizeTasks free SetEvent(m_hAllCallbacksInvokedEvent); } else { WaitForSingleObject(m_hAllCallbacksInvokedEvent, INFINITE); } return NULL; } static void PrepareCallback( fFunc pfunc, void* pParam, unsigned int uCount ) { m_fCallback = pfunc; m_pCallbackParam = pParam; m_lCallbacksCount = uCount; ResetEvent(m_hAllCallbacksInvokedEvent); } protected: friend class TaskManagerTBB; static void* m_hAllCallbacksInvokedEvent; static fFunc m_fCallback; static void* m_pCallbackParam; static volatile long m_lCallbacksCount; }; // class SynchronizeTask void* SynchronizeTask::m_hAllCallbacksInvokedEvent = NULL; fFunc SynchronizeTask::m_fCallback = NULL; void* SynchronizeTask::m_pCallbackParam = NULL; volatile long SynchronizeTask::m_lCallbacksCount = 0; /////////////////////////////////////////////////////////////////////////////// // InitThreadLocalData - Init thread specific data void InitThreadLocalData( void* arg ) { printf("thread id %d\n", tbb::this_tbb_thread::get_id()); // The notify list is kept in tls (thread local storage). if (NULL == ::TlsGetValue(tlsIndex)) { eventVector* ev = new eventVector(); ev->reserve(512); ::TlsSetValue(tlsIndex, ev); printf("prepared TLS thread id %d\n", tbb::this_tbb_thread::get_id()); EnterCriticalSection(&crit); eventVectorGrouping.push_back(ev); LeaveCriticalSection(&crit); } else { printf("TLS INDEX WAS ALREADY SET, ABORTED\n"); } } int main(int argc, char* argv[]) { InitializeCriticalSection(&crit); auto m_uPrimaryThreadID = tbb::this_tbb_thread::get_id(); auto m_uRequestedNumberOfThreads = tbb::task_scheduler_init::default_num_threads(); //m_uRequestedNumberOfThreads = 8; auto m_pTbbScheduler = new tbb::task_scheduler_init(m_uRequestedNumberOfThreads); printf("TBB started with %d threads.\n", m_uRequestedNumberOfThreads); void* pData = nullptr; tlsIndex = TlsAlloc(); printf("tls Index: %d\n", tlsIndex); SynchronizeTask::PrepareCallback(InitThreadLocalData, pData, m_uRequestedNumberOfThreads); tbb::task* pBroadcastParent = new(tbb::task::allocate_root()) tbb::empty_task; pBroadcastParent->set_ref_count(m_uRequestedNumberOfThreads + 1); tbb::task_list tList; for (unsigned int i = 0; i < m_uRequestedNumberOfThreads; i++) { tbb::task *pNewTask = new(pBroadcastParent->allocate_child()) SynchronizeTask; tList.push_back(*pNewTask); } pBroadcastParent->spawn_and_wait_for_all(tList); pBroadcastParent->destroy(*pBroadcastParent); DeleteCriticalSection(&crit); getchar(); // parallel_for later that accesses invalid TLS return 0; }
Output (triggered spam filter so I had to pastebin it)
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It seems that you expect m_uRequestedNumberOfThreads SynchronizeTask tasks to be executed on as many threads and for these to be the only threads to ever access the TLS, but TBB intentionally does not work that way: all tasks might be executed by the main thread itself (especially if initialisation is cheap), and some by threads that don't even participate in executing the parallel_for (probably not in this program, but you're setting up for failure if you rely on a particular execution). Instead, let the TLS initialise itself, e.g., by using the "enumerable_thread_specific(Finit)" constructor, or whatever is the equivalent with the other API you are using.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If it's suitable for you, I think using the constructor to initialize the TLS is probably the best idea. However, you could also have a look at the task_scheduler_observer, which will let you know about all the threads used by the scheduler.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page