- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello, we are trying to use Parallel_For for a loop that gets called many, many times. We implemented it and we are now pegging a 4 cores of a Quad PC !! (Intel S5000VSA motherboard of course), but it's 10 times slower!
Using VTune I see under Processes:
Process Timer%
pid_0x0 83.14%
OurExecutable.exe 15.35%
Threads (Inside OurExecutable.exe)
Thread Process Timer%
Thread131 OurExecutable.exe 52.93%
Thread125 OurExecutable.exe 46.52%
Modules (In either above Threads)Below is one of them
Module Process Timer%
tbb.dll OurExecutable.exe 65.31%
OurDLL.dll OurExecutable.exe 17.65%
Inside tbb.dll
Name Timer%
wait_for_all 49.60%
spawn 15.92%
steal_task 15.63%
allocate 4.59%
allocate 3.46%
Seems like most of the time is inside TBB.dll? Any thoughts? Maybe we are trying to parallelize a loop that is already very tight yet it's called many many many times. We were hoping we could optimize using parallel_for, but maybe we are not using it right or we are not implementing it correctly.
Any help on this issue would be greatly appreciated.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It's just a duplicate of the question being discussed in another thread. I suggest to close this thread and move to the earlier one.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page