Dose parallel_for support map iterator?

leemeng · ‎07-04-2011

Hi all,

I am a newbie. I follow the tutorial and create a small program using parallel_for. I think I have done everything correctly, but the code dose not compile at all:( Can anyone kindly point out what is going wrong?

#include

#include "tbb/parallel_for.h"

#include "tbb/blocked_range.h"

#include "tbb/concurrent_hash_map.h"

typedef tbb::concurrent_hash_map MemberTable;

class IntIterator {

public:

void operator() ( const blocked_range<:ITERATOR>& r ) const {

for ( MemberTable::iterator i = r.begin(); i != r.end(); ++i ) {

printf("Key=%d, Value=%d \\n", i->first, i->second);

}

IntIterator() { }

};

int main()

{

MemberTable m;

for(int i=1; i<=10; i++)

m.insert(pair(i, -i));

//if comment out this line, it compiled!

parallel_for(blocked_range<:ITERATOR>(m.begin(), m.end()), IntIterator());

}

leemeng · ‎07-04-2011

I think I figure out the answer on my own. According to the source code in

blocked_range.h(http://threadingbuildingblocks.org/files/documentation/a00266.html) and parallel_for.h (http://threadingbuildingblocks.org/files/documentation/a00266.html)

The answer is no. I think this is a nice feature that TBB missing, Would TBB development team add this in the future?

RafSchietekat · ‎07-04-2011

The current situation has the advantage of preventing unprotected concurrent use of printf()...

The URL you give for parallel_for.h is the same as for blocked_range (you could also hide the URL and make the file name a hyperlink), although I don't see how the implementation of parallel_for matters here.

What could be done is letting the concurrent map return a range of unspecified type (not blocked_range, and with a different or no notion of grainsize), departing from the limiting traditional-STL begin/end interface, but you may need additional rationale beyond how nice it would be to parallelise... a minute fraction of the lifetime of the map (see Amdahl's law). Even frivolous features have to be maintained, and their expectation may needlessly limit the implementation of this and other container types.

Your program must be in pretty good shape if this is your only remaining worry! Or very basic. :-)

Kirill_R_Intel · ‎07-05-2011

Initial problem is because blocked_range needs random-access iterator, and concurrent_hash_map supports only forward iterators. See sections 5.3.6 and 4.2.1 of theReference document. This limitation of concurrent_hash_map iterator is dictated by the container structure, there are no plans to change it.

Also concurrent_hash_map class has it's own range that doesn't have requirement for random access iterator, just in case.

Regards,
Kirill

RafSchietekat · ‎07-05-2011

"Also concurrent_hash_map class has it's own range that doesn't have requirement for random access iterator, just in case."

Exactly what I meant, how did I miss that? But surely the grainsize won't behave exactly like a blocked_range's grainsize?

Kirill_R_Intel · ‎07-06-2011

Well, the concurrent_hash_map can be imagined like a one-dimentional array of buckets. Each bucket is a list of elements. begin() and end() point to elements. In concurrent_hash_map::range grainsize is applied to buckets, not individual element, thus it doesn't correlate with begin() and end(). If grainsize is minimal (1), concurrent_hash_map can be divided up to one bucket. Grainsize indicates granularity of the bucket array.It doesn't matter what values begin() and end() have because they point to elements.

blocked_range is independant from container structure. Grainsize in blocked_range indicates granularity of list of elements. Here both grainsize and begin()/end()pointers apply to the same list of elements.
So if end() - begin() > grainsize, we can divide further.

RafSchietekat · ‎07-06-2011

"Well, the concurrent_hash_map can be imagined like a one-dimentional array of buckets."
It didn't use to be that way (there were segments).

"Grainsize indicates granularity of the bucket array."
That means that for small grainsize values there will tend to be tasks with nothing useful to do, and for large grainsize values a factor 1/4 fewer elements than with a blocked_range on average (unless I've got my math wrong), for the current implementation that is. As long as this is not too much of a surprise...

Has this been demonstrated to improve performance over the lifetime of real programs, I wonder?

Kirill_R_Intel · ‎07-14-2011

Raf,

Sorry, I'm not aware about such specificperfrormance measurements.