- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I'm trying to use tbb::parallel_for for an image denoising application. My problem is with passing the necessary data structures into the parallel loop class.
Due to efficiency reasons, all the data structures are C style arrays., e.g.
while the rest of the data are of basic types (float, int, ...). Sizes of structures depend on input image parameters, so they can range from a few bytes to dozens of megabytes.
I tried passing them by-reference through the constructor into member variables as well as using a local object in the anonymous namespace to hold the data until it is read by/copied into the loop object. Both methods resulted in some pretty bad speed - much slower then not using TBB. I briefly checked on the available documents (Getting Started, Tutorials, Design Patterns) but could not find usefull information regarding large scale data passing.
Would someone please point out to me where to look for information regarding this issue ? Most likely I have missed some important information while viewing the documents. Also, what are the "best practices" (at least by name, so i can look them up) concerning this issue.
Thanks in advance,
Amnu
PS: Sorry if this is the wrong forum, I only saw this one forum for TBB.
I'm trying to use tbb::parallel_for for an image denoising application. My problem is with passing the necessary data structures into the parallel loop class.
Due to efficiency reasons, all the data structures are C style arrays., e.g.
[cpp]float* imageData; Position* imagePos; float** neighbourhoods; bool* imageBorder; int maxBlockSizeSqr, maxWindowSize; float* gaussKern; float noisyImageDeviation, filterParam; Image* outImage;[/cpp]
while the rest of the data are of basic types (float, int, ...). Sizes of structures depend on input image parameters, so they can range from a few bytes to dozens of megabytes.
I tried passing them by-reference through the constructor into member variables as well as using a local object in the anonymous namespace to hold the data until it is read by/copied into the loop object. Both methods resulted in some pretty bad speed - much slower then not using TBB. I briefly checked on the available documents (Getting Started, Tutorials, Design Patterns) but could not find usefull information regarding large scale data passing.
Would someone please point out to me where to look for information regarding this issue ? Most likely I have missed some important information while viewing the documents. Also, what are the "best practices" (at least by name, so i can look them up) concerning this issue.
Thanks in advance,
Amnu
PS: Sorry if this is the wrong forum, I only saw this one forum for TBB.
Link Copied
2 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Compilers tend to do much better with non-address-takenlocal variables and formal parameters than structure fields. The reason is that the compiler can analyze such variables much more precisely than it can address-taken variables or fields. Though advanced compilers can sometimes optimize address-taken variables if they can track all places the address might go to.
So what I would do is inside the functor is load all the values into local variables before executing the serial for loop over a subrange. Below is a sketch of how to go about this. The constructor for the loop body captures a pointer imageData in a member m_imageData. Then operator() loads the member back into a local pointer.
So what I would do is inside the functor is load all the values into local variables before executing the serial for loop over a subrange. Below is a sketch of how to go about this. The constructor for the loop body captures a pointer imageData in a member m_imageData. Then operator() loads the member back into a local pointer.
[cpp]struct body { float* m_imageData; body( float* imageData ) : m_imageData(imageData) {} void operator()( tbb::blocked_range& r ) const; }; void body::operator()( tbb::blocked_range & r ) const { // Load pointer into local temporary float* imageData = m_imageData; int end = r.end(); for( int i=r.begin(); i!=end; ++i ) { //... } } void callsite( int n, float* imageData ) { tbb::parallel_for( tbb::blocked_range (0,n), body(imageData) ); }
[/cpp]
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You might also consider using something that will not require encapsulation objects and object operator functions such as Cilk++ (available in the new Parallel Studio), the old stalwart OpenMP, or QuickThread (my little project)
A rework of Arch's example using QuickThread:
A rework of Arch's example using QuickThread:
[bash]void doWork( int iBegin, int iEnd, float* imageData) {
for( int i = iBegin, i < iEnd; ++i) {
...
}
}
void callsite( int n, float* imageData ) {
qt::parallel_for( doWork, 0, n, imageData); }
.
Jim Dempsey
[/bash]
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page