TBB's UnFair Task Scheduling - Page 3

Shankar1 · ‎10-28-2009

I have a couple of questions regarding TBB's unfair scheduling.

1. I read through the user manual that describes Task Scheduler and this what it says

" Tasks in Intel Threading Building Blocks are efficient too because the scheduler is unfair. Thread schedulers typically distribute time slices in a round-robin fashion. This distribution is called fair, because each logical thread gets its fair share of time. Thread schedulers are typically fair because it is the safest strategy to undertake without understanding the higher-level organization of a program. In task-based programming, the task scheduler does have some higher-level information, and so can sacrifice fairness for efficiency. Indeed, it often delays starting a task until it can make useful progress."

And as I understand the above TBB scheduler makes sure that the thread executing a task runs until the task completes. Is this correct? If yes how is this done?

2. "Indeed, it often delays starting a task until it can make useful progress."
What does the above line mean?

3. Im planning to use the TBB scheduler in my application wherein I have a thread(boost thread) which waits on data packets to arrive. The boost thread creates a tbb::task and spawns it to the TaskScheduler and waits for taking the next data packet. The frequency at which the data packets arrive is not determinate( they may arrive at high frequency and sometimes at a low frequency). Hence the decision to use boost thread to wait on the data packets and spawn tasks for processing the packet.

And lets say I initialize the scheduler with number of physical threads( lets say 4). Also lets assume the boost thread reads four data packets and spawns tasks for them to the task scheduler( using task::spawn() ) and waits for further data packets to arrive. All the four worker threads in the scheduler steals a task each and start executing the stolen task. Now lets say a data packet arrives when all the worker threads are busy executing their task.

Now the question is when will the boost thread get its turn for collecting the data packet and spawning a task. Will it get its turn only when one of the worker thread finishes executing the task? How will the overall scheduling work now? Will it be fair or unfair?

4. Now in the above case the boost thread will always spawn a task into its own readypool and the worker threads will try to always steal from the ready pool of the boost thread. Is it possible that I target the spawn into the ready pools of other worker threads in the scheduler( may be randomly)? Because it makes sense for me to that because the boost thread will never process tasks in its own ready pool( they are always stolen by worker threads in the scheduler).

--------
Sankar

smasherprog · ‎11-05-2009

Quoting - Shankar

I have a couple of questions regarding TBB's unfair scheduling.

1. I read through the user manual that describes Task Scheduler and this what it says

" Tasks in Intel Threading Building Blocks are efficient too because the scheduler is unfair. Thread schedulers typically distribute time slices in a round-robin fashion. This distribution is called fair, because each logical thread gets its fair share of time. Thread schedulers are typically fair because it is the safest strategy to undertake without understanding the higher-level organization of a program. In task-based programming, the task scheduler does have some higher-level information, and so can sacrifice fairness for efficiency. Indeed, it often delays starting a task until it can make useful progress."

And as I understand the above TBB scheduler makes sure that the thread executing a task runs until the task completes. Is this correct? If yes how is this done?

2. "Indeed, it often delays starting a task until it can make useful progress."
What does the above line mean?

3. Im planning to use the TBB scheduler in my application wherein I have a thread(boost thread) which waits on data packets to arrive. The boost thread creates a tbb::task and spawns it to the TaskScheduler and waits for taking the next data packet. The frequency at which the data packets arrive is not determinate( they may arrive at high frequency and sometimes at a low frequency). Hence the decision to use boost thread to wait on the data packets and spawn tasks for processing the packet.

And lets say I initialize the scheduler with number of physical threads( lets say 4). Also lets assume the boost thread reads four data packets and spawns tasks for them to the task scheduler( using task::spawn() ) and waits for further data packets to arrive. All the four worker threads in the scheduler steals a task each and start executing the stolen task. Now lets say a data packet arrives when all the worker threads are busy executing their task.

Now the question is when will the boost thread get its turn for collecting the data packet and spawning a task. Will it get its turn only when one of the worker thread finishes executing the task? How will the overall scheduling work now? Will it be fair or unfair?

4. Now in the above case the boost thread will always spawn a task into its own readypool and the worker threads will try to always steal from the ready pool of the boost thread. Is it possible that I target the spawn into the ready pools of other worker threads in the scheduler( may be randomly)? Because it makes sense for me to that because the boost thread will never process tasks in its own ready pool( they are always stolen by worker threads in the scheduler).

--------
Sankar

Sankar, I am not a pro like some others here, but I ask, why would you dedicate a whole thread to solely wait on data from the network?

Wouldn't it be faster and more efficient to override the task scheduler for something like this:

task::execute(){
spawn( Read data from the network)
spawn(do something else concurrently using the other threads that are doing nothing)
}
task::execute(){
spawn(process the data just read in parallel)
}
task::execute(){
spawn(send out data to the network)
spawn(do something else concurrently using the other threads that are doing nothing)
}

I am getting tired so i just broke down the general program. Your program needs to be broke into three parts. Unfortunately, when network data processing comes in, this is your best bet. In your program, one entire thread is essentially, dead. It's sole existence is to wait on data. Sure, there might be some sleep time in there, but from what I have read, I don't think that was your plan. And, that would actually slow your whole program down too. So, if you are running a quad core, you are down 25%. The real goal is to find work for your threads. Always keep the thread busy and use implicit locking and you will be golden.

I belive the above outline is a faster way to process network data than to have on thread running concurrently with the rest of the program checking for data and filling some kind of a queue with data received off the network. The queue would be under heavy load from these worker threads checking for data continuously.

Now, you may have to restructure your program to find work for your threads while you are reading and writing to the network, and I know you can do it, because I did. If you minimize the amount of dead time your threads spend doing nothing, you will speed up your program...... isn't that obvious? so.... find work for your threads!

RafSchietekat · ‎11-06-2009

"why would you dedicate a whole thread to solely wait on data from the network?"
Because TBB assumes that hardware threads executing tasks shouldn't be bothered with anything else, so as a general rule any potentially significant waiting should be done in an external thread.

(Added) That one external thread might also handle all communications at once, but if you spawn tasks to wait on individual data channels, each of those tasks blocks a whole software thread (by default one per hardware thread), and you may easily starve the whole process even with many hyperthreaded cores available.

Shankar1 · ‎11-09-2009

Sorry for the delayed reply.

Quoting - smasherprog

In your program, one entire thread is essentially, dead. It's sole existence is to wait on data. Sure, there might be some sleep time in there, but from what I have read, I don't think that was your plan. And, that would actually slow your whole program down too. So, if you are running a quad core, you are down 25%

What I would do is deffer the creation of threads in the scheduler as there are cores and not as default(wherein the scheduler creates one thread lesser than cores). And I would also have an extra IO thread which would wait on data and just spawn tasks. This way Im not down by 25%.