Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.
2464 Discussions

When using schedule affinity on TBB threadssequential

linvincent
Beginner
240 Views
Dear all,
I am new in TBB, and recently I implemented the TBB pipeline for image processing in a parallel fashion and tried to limit the TBB threads to some specific cores by the Linux system call, sched_setaffitity. With 4 cores, I found that the encoding rate can reach 16 fps which is better than the sequential processing. However, when limiting the number of cores to 3, the encoding rate dramatically dropped to ~6 fps which is worse than the sequential version with 12 fps. The experiment environment is under Intel Core2 Quad CPU Q6600 2.4GHz with 2 GB RAM. I think it is unreasonable that the performance with 3 cores dropped to half of that with 4 cores. I wonder if it is possible that the schedule affinity mayinterfere the TBB task scheduling. Could anyone please help to answer this question? This problem bothered for a long time. Thanks for any help.
0 Kudos
3 Replies
Anton_Pegushin
New Contributor II
240 Views
Quoting - linvincent
Dear all,
I am new in TBB, and recently I implemented the TBB pipeline for image processing in a parallel fashion and tried to limit the TBB threads to some specific cores by the Linux system call, sched_setaffitity. With 4 cores, I found that the encoding rate can reach 16 fps which is better than the sequential processing. However, when limiting the number of cores to 3, the encoding rate dramatically dropped to ~6 fps which is worse than the sequential version with 12 fps. The experiment environment is under Intel Core2 Quad CPU Q6600 2.4GHz with 2 GB RAM. I think it is unreasonable that the performance with 3 cores dropped to half of that with 4 cores. I wonder if it is possible that the schedule affinity mayinterfere the TBB task scheduling. Could anyone please help to answer this question? This problem bothered for a long time. Thanks for any help.
Hello,

I'm thinking since your test platform is not of a NUMA architecture, threads placement generally should not really matter. When you used affinity API did you use some logic for pinning threads the way you did or were you just making sure two of your worker threads won't end up on the same hardware core? What I'd suggest is to profile 3-thread implementation and 4-thread one using for instance Intel VTune Performance Analyzer, compare the hot-spots and figure out the reason for the difference.
0 Kudos
Dmitry_Vyukov
Valued Contributor I
240 Views
Quoting - linvincent
tried to limit the TBB threads to some specific cores by the Linux system call, sched_setaffitity.

Did you reduced number of worker threads simultaneously (parameter to tbb::task_scheduler_init object)?

0 Kudos
linvincent
Beginner
240 Views
Dear Anton and Dmitriy,
Firstly, thank you for you rgreat help sincerely.
I tried different numbmer of TBB threads including 8, 16, 32, and default one, but did not try 3- or 4-thread.
Yes, I think it is good idea to reduce the number of worker thread and check if there is anything I missed.
I will do more experiments and report to you. Thank you so much.

0 Kudos
Reply