Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.

Question about speed of a Parallel_For loop

postaquestion
Beginner
347 Views

Hello, we are trying to use Parallel_For for a loop that gets called many, many times. We implemented it and we are now pegging a 4 cores of a Quad PC !! (Intel S5000VSA motherboard of course), but it's 10 times slower!

Using VTune I see under Processes:

Process Timer%
pid_0x0 83.14%
OurExecutable.exe 15.35%

Threads (Inside OurExecutable.exe)

Thread Process Timer%
Thread131 OurExecutable.exe 52.93%
Thread125 OurExecutable.exe 46.52%

Modules (In either above Threads)Below is one of them

Module Process Timer%
tbb.dll OurExecutable.exe 65.31%
OurDLL.dll OurExecutable.exe 17.65%

Inside tbb.dll

Name Timer%
wait_for_all 49.60%
spawn 15.92%
steal_task 15.63%
allocate 4.59%
allocate 3.46%

Seems like most of the time is inside TBB.dll? Any thoughts? Maybe we are trying to parallelize a loop that is already very tight yet it's called many many many times. We were hoping we could optimize using parallel_for, but maybe we are not using it right or we are not implementing it correctly.

Any help on this issue would be greatly appreciated.

0 Kudos
2 Replies
robert-reed
Valued Contributor II
347 Views
From the stats you list, it looks to me like your code is mostly spinning (idling), with too little work to do. Can you share some code to giveus a better idea whatyou're trying to do?
0 Kudos
Alexey-Kukanov
Employee
347 Views

It's just a duplicate of the question being discussed in another thread. I suggest to close this thread and move to the earlier one.

0 Kudos
Reply