Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.

Initializing TLS for all threads in the scheduler

Sanee_B_
Beginner
513 Views

Hello, I'm a newbie experimenting with TBB for the first time. I want to set up an array in TLS for each thread in the TBB task scheduler so that I can later execute parallel_for across an array of objects that write events to the pre-allocated TLS. I've copied the approach to setting up the TLS from a demo application released by Intel some time ago (Intel Smoke demo), but I must be missing something because it works fine in their demo but not for me. 

The issue, as far as I can tell, is that when I run the synchronization task it seems that TBB is executing some of the tasks in the same thread, so the TLS does not get initialized for some threads that exist in the scheduler. Later when I call the parallel_for it uses all of the threads and from some threads tries to access the TLS that didn't get initialized. I'm including my code and example output from the compiled program below.

class SynchronizeTask : public tbb::task
{
public:
	SynchronizeTask() {}

	tbb::task *execute()
	{
		m_fCallback(m_pCallbackParam);

		if (InterlockedDecrement(&m_lCallbacksCount) == 0)
		{
			// set all of the SynchronizeTasks free
			SetEvent(m_hAllCallbacksInvokedEvent);
		}
		else
		{
			WaitForSingleObject(m_hAllCallbacksInvokedEvent, INFINITE);
		}

		return NULL;
	}
	
	static void PrepareCallback(
		fFunc pfunc,
		void* pParam,
		unsigned int uCount
		)
	{
		m_fCallback = pfunc;
		m_pCallbackParam = pParam;
		m_lCallbacksCount = uCount;
		ResetEvent(m_hAllCallbacksInvokedEvent);
	}

protected:
	friend class TaskManagerTBB;
	static void* m_hAllCallbacksInvokedEvent;
	static fFunc m_fCallback;
	static void* m_pCallbackParam;
	static volatile long m_lCallbacksCount;
}; // class SynchronizeTask

void* SynchronizeTask::m_hAllCallbacksInvokedEvent = NULL;
fFunc SynchronizeTask::m_fCallback = NULL;
void* SynchronizeTask::m_pCallbackParam = NULL;
volatile long SynchronizeTask::m_lCallbacksCount = 0;



///////////////////////////////////////////////////////////////////////////////
// InitThreadLocalData - Init thread specific data
void
InitThreadLocalData(
void* arg
)
{
	printf("thread id %d\n", tbb::this_tbb_thread::get_id());

	// The notify list is kept in tls (thread local storage).
	if (NULL == ::TlsGetValue(tlsIndex))
	{
		eventVector* ev = new eventVector();
		ev->reserve(512);

		::TlsSetValue(tlsIndex, ev);

		printf("prepared TLS thread id %d\n", tbb::this_tbb_thread::get_id());

		EnterCriticalSection(&crit);

		eventVectorGrouping.push_back(ev);

		LeaveCriticalSection(&crit);
	}
	else
	{
		printf("TLS INDEX WAS ALREADY SET, ABORTED\n");
	}
}

int main(int argc, char* argv[])
{

	InitializeCriticalSection(&crit);

	auto m_uPrimaryThreadID = tbb::this_tbb_thread::get_id();

	auto m_uRequestedNumberOfThreads = tbb::task_scheduler_init::default_num_threads();

	//m_uRequestedNumberOfThreads = 8;

	auto m_pTbbScheduler = new tbb::task_scheduler_init(m_uRequestedNumberOfThreads);

	printf("TBB started with %d threads.\n", m_uRequestedNumberOfThreads);

	void* pData = nullptr;

	tlsIndex = TlsAlloc();

	printf("tls Index: %d\n", tlsIndex);

	SynchronizeTask::PrepareCallback(InitThreadLocalData, pData, m_uRequestedNumberOfThreads);

	tbb::task* pBroadcastParent = new(tbb::task::allocate_root()) tbb::empty_task;

	pBroadcastParent->set_ref_count(m_uRequestedNumberOfThreads + 1);

	tbb::task_list tList;
	for (unsigned int i = 0; i < m_uRequestedNumberOfThreads; i++)
	{
		tbb::task *pNewTask = new(pBroadcastParent->allocate_child()) SynchronizeTask;

		tList.push_back(*pNewTask);
	}

	pBroadcastParent->spawn_and_wait_for_all(tList);
	pBroadcastParent->destroy(*pBroadcastParent);

	DeleteCriticalSection(&crit);

	getchar();

// parallel_for later that accesses invalid TLS 

	return 0;
}

 

Output (triggered spam filter so I had to pastebin it)

http://pastebin.com/U8p0MjSP

0 Kudos
2 Replies
RafSchietekat
Valued Contributor III
513 Views

It seems that you expect m_uRequestedNumberOfThreads SynchronizeTask tasks to be executed on as many threads and for these to be the only threads to ever access the TLS, but TBB intentionally does not work that way: all tasks might be executed by the main thread itself (especially if initialisation is cheap), and some by threads that don't even participate in executing the parallel_for (probably not in this program, but you're setting up for failure if you rely on a particular execution). Instead, let the TLS initialise itself, e.g., by using the "enumerable_thread_specific(Finit)" constructor, or whatever is the equivalent with the other API you are using.

0 Kudos
jiri
New Contributor I
513 Views

If it's suitable for you, I think using the constructor to initialize the TLS is probably the best idea. However, you could also have a look at the task_scheduler_observer, which will let you know about all the threads used by the scheduler.

https://www.threadingbuildingblocks.org/docs/help/reference/task_scheduler/task_scheduler_observer.htm

0 Kudos
Reply