- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I had difficulty with my first TBB program, I basically copied the parallel_reduce example from the TBB tutorial:
class SumFoo {
float* my_a;
public:
float sum;
void operator()( const blocked_range
float *a = my_a;
for( size_t i=r.begin(); i!=r.end(); ++i )
sum += Foo(a);
}
SumFoo( SumFoo& x, split ) : my_a(x.my_a), sum(0) {}
void join( const SumFoo& y ) {sum+=y.sum;}
SumFoo(float a[] ) :
my_a(a), sum(0)
{}
};
Mytest just summed the contents of an array of doubles.parallel_for was twiceas slow as the serial test because variable sum was being stored and read from memory eachiteration, but in the serial case itwas optimized as a register variable. I'm using Visual Studio 2005.
My understanding was that the body objectis only accessed by one thread (except the splitting constructor), is there a memorybarrier being added by TBB?
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Wow, that's our Tutorial that promotes this "favorite" array summation example, which bad performance I analyzed in my blog... Seems we should fix the document.
Your understanding is right, the body object is updated by just one thread. But for parallel_reduce, it is passed by reference to many task objects, and of course there are memory barriers to process these tasks correctly. I believe referencing the same body in several tasks is enough for compilers to prevent using a register for the sum; or might be even just making it a class member is enough.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I see how the function can be improved but don't understand whythe Visual C++compiler produces the slower code.
It can't unroll the loop because the condition is i != end instead of i < end, but why can't it use a register for the sum?
I've posted a sample program to Microsoft to see ifanyone can answer: http://forums.microsoft.com/msdn/ShowPost.aspx?siteid=1&postid=3417908 I've been able to take TBB out of the picture, it appears that just allocating the object on the heap is enough to cause this behavior.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
How does i!=end vs. i
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It has less to do with treating all variables as atomics as not knowing whether there is aliasing going on during pointer dereferences and so not being assured of having private access, a requirement to promote a variable to register allocation. But I admit being blithe in my previous response, looking only to Alexey's example rather than to the code posted at MSDN. I've since rectified that situation and see two calls to the same summing class, one using a this pointer and the other using a this reference (ptc->add(...) vs tc.add(...)) which the compiler obviously can assume is alias-free because it's able to inline the add call andtreat tc.sum as a variable that can be promoted to a register whereas ptc->sum is held at arm's length and not promoted to a register by this compiler. It depends on how much the compiler knows about the pointer. I wonder in this case whether adding a -noalias (/Qnoalias maybe?) switch to the compile would be enough to allow the promotion in the former case as well as the latter.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm actually still wondering myself how we know we can use our existing compilers for multithreading without the benefit of an existing specification as thorough as what I first saw for Java.
(Removed)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Wouldn't the side effects be limited to the loop itself? Without multithreading issues, the compiler just has to focus on optimizing the loop which contains no side effects, unless dereferencing the iterator i introduces one.
class test_class
{
public:
int sum;
void add(vector
{
for (vector
sum += *i;
}
};
Unless there's an issue with side effects that I'm not understanding, this must be a multithreading consideration of the compiler, it assumes that another thread could be accessing the sametest_class instance.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I suppose you could look at innermost cache hits/misses to detect activity not going to a register, but your life would be complicated by a variety of issues including what's the range and trip count of each loop (code optimization throws the loop bounds into question). (You might also go to whatif.intel.com and check out PTU, which will run on your Vtune analyzer license and can separate the cache access events by basic block.) Then you'd want to count the number of cache accesses per trip count and compare that to the number of accesses you'd expect with register storage of the intermediates (if you're doing a simple sum, one cache access per array element would be reasonable). With big trip counts it'll beeasier to discern than with smaller one and you'll also have to be aware of where the compiler unrolls a loop for performance, which will change the expected ratios, but it sounds like a lot of work.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page