Re: tbb scheduler idle

urykhy · ‎01-17-2008

hi,

i use tbb::pipeline to run stats procesing in parallel
code looks like:

pipeline.add_filter(st1); // pick task day to process
pipeline.add_filter(st2); // actual processing, parallel
pipeline.add_filter(st3); // reduce, seq processing
pipeline.run();

st1 - stage very fast
st2 - really slow, it can take about 1 sec to run on my hardware
st3 - 0.2...0.3 seconds to run

i add debug points to every filter like:
operator()(void*){
mark_enter
calculations
mark_leave
}
to collect timestamps when task executing

in test run was 3 days to process
max 4 tokens to run in pipeline
pentium4 D HT (2 threads)
linux 2.6.18 kernel, debian etch

stats shows:
http://keep4u.ru/imgs/b/080117/96/9620aea591b7dce6b5.jpg

legend:
cals - my 2nd stage
and reduce - 3rd

value 0.2 means stage performed by 2 threads, 0.1 - only one thread.

so i wounder why tbb does not run reduce for 2nd day immediatly after reduce for 1st day is done

it waits until all filters done, and only then picks task to execute
it seems strange to me

is exists any possibility to run stages as soon as previous stage done to maximize CPU usage ?

Alexey-Kukanov · ‎01-17-2008

I would guess that at the second stage day 3 was processed before day 2. But the third stage, which is ordered, can't start "reducing" day 3 before day 2. I think you could easily check this guess extending the information collected at debug points with some data-specific info (e.g. the number of the day being processed).

As far as I understand, hyperthreads aren't quite "even"; one of the threads only has chances to execute when some processor units aren't used by the other one. So I do not wonder if in your case the main thread started to process day 1, the TBB worker thread took day 2 but made slow progress (due to HT), then the main thread completed day 1, took day 3, and having kind of priority on the processor resources completed day 3 before the worker thread finished day 2.

If that's the case, I wonder if adding a pause/yield point right before taking a new token from the pipeline would help the second thread take priority on processor resources and complete its job earlier.

urykhy · ‎01-17-2008

yes, logging shows that "3rd-day" can really pass stage2 before "2nd-day". thanks!

Andrey_Marochko · ‎01-17-2008

I believe that adding yield or pauseoperations to the main thread won't help. OS does not discern logical CPUs in HT systems (at least it was so some time ago). Therefore when the main thread relinquishes its time quantum, the system will see that another thread is already working, and so it will resume the main thread. During all this time the processing will be happening in the same (main) pipeline of the CPU and so the second thread will remain in the secondary (low priority) CPU pipeline.

I think the problem could be solved by increasing the maximal number of tokens in flight. E.g. if the hyper-thread works at 15% of the main one speed, than 7 or 8 tokens will assure the acceptable balance. You couldplay with number of tokens in the range 6-15 and find the value resulting in the maximal throughput.

Regards,

-Andrey Marochko

urykhy · ‎01-18-2008

yes, sched_yield() in stage1 doesn't help

on real 4-way xeon box tasks mostly processed as planed.
(really "1-st day" have a little more data than follows)