Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.
2463 Discussions

Question about speed of a Parallel_For loop


Hello, we are trying to use Parallel_For for a loop that gets called many, many times. We implemented it and we are now pegging a 4 cores of a Quad PC !! (Intel S5000VSA motherboard of course), but it's 10 times slower!

Using VTune I see under Processes:

Process Timer%
pid_0x0 83.14%
OurExecutable.exe 15.35%

Threads (Inside OurExecutable.exe)

Thread Process Timer%
Thread131 OurExecutable.exe 52.93%
Thread125 OurExecutable.exe 46.52%

Modules (In either above Threads)Below is one of them

Module Process Timer%
tbb.dll OurExecutable.exe 65.31%
OurDLL.dll OurExecutable.exe 17.65%

Inside tbb.dll

Name Timer%
wait_for_all 49.60%
spawn 15.92%
steal_task 15.63%
allocate 4.59%
allocate 3.46%

Seems like most of the time is inside TBB.dll? Any thoughts? Maybe we are trying to parallelize a loop that is already very tight yet it's called many many many times. We were hoping we could optimize using parallel_for, but maybe we are not using it right or we are not implementing it correctly.

Any help on this issue would be greatly appreciated.

0 Kudos
2 Replies
Valued Contributor II
From the stats you list, it looks to me like your code is mostly spinning (idling), with too little work to do. Can you share some code to giveus a better idea whatyou're trying to do?
0 Kudos

It's just a duplicate of the question being discussed in another thread. I suggest to close this thread and move to the earlier one.

0 Kudos