Re: help using blocked_range2d

mblizz · ‎02-03-2009

Can anyone provide a simple example on how to use a blocked_range2d in a parallel_for.

Another is that I have a problem.

I have this 2d vector of type int.

Let's say

9 23 5 3 4 55 6
3 12 456 56
34 7 2
1 4 6

and i want to sort each row. I have implemented using parallel for it but it turned out to be slower than the serial implementation. Maybe the problem is my STL. Can anyone help?

here is class template

class ApplySort {
vector >::iterator bucket;
public:
void operator( )(const blocked_range& r ) const{
for( long row=r.begin(); row long i;
int key;
for(int j=1;j {
key=bucket[row];
i=j-1;
while(i>=0 && bucket[row]>key )
{
bucket[row][i+1]=bucket[row];
i--;
}
bucket[row][i+1]=key;
}
}
}

ApplySort(vector >::iterator &buck) :
bucket(buck)
{}
};

//here is my template function

ApplySort myClass(bucket.begin());
parallel_for(blocked_range(0,buck_num,(buck_num-1)/2), myClass);

where buck_num is the number of rows...

Would I try using blocked_range2d and chage my implementation in the template class??

RafSchietekat · ‎02-04-2009

I would recommend touse a library sort function in each bucket, and to make your grain size in terms of number of buckets or of total number of elements in those buckets. Currently you're putting a fixed number of tasks at work whatever the problem size, which is not the purpose of specifying a grain size. Try different values to find an optimum, and don't be shy exploring them exponentially (your problem will need to be many times bigger before you'll see any benefit).

RafSchietekat · ‎02-04-2009

I have a question myself: is vector thread safe on all current platforms for this kind of access? Did I overlook any promises to that effect in C++0x? Or am I just too paranoid?

Anton_Pegushin · ‎02-04-2009

Quoting - Raf Schietekat

I would recommend touse a library sort function in each bucket, and to make your grain size in terms of number of buckets or of total number of elements in those buckets. Currently you're putting a fixed number of tasks at work whatever the problem size, which is not the purpose of specifying a grain size. Try different values to find an optimum, and don't be shy exploring them exponentially (your problem will need to be many times bigger before you'll see any benefit).

Hi, given the problem I think I'd try parallel_for over a blocked_range with a nested parallel_sort (code bellow). This would let you sort rows of different lengths and stay as much parallel as you can. parallel_sort has a threshold for serial execution, so you won't have to worry if your vectors are too small for parallel sorting. And I'd stick with auto_partitioner for the parallel_for unless you absolutely certain, that you know how to find an optimal grainsize.

What do you think of this solution?

[cpp]class MySort {
    vector< vector > &vec;
public:
    MySort(vector< vector >& vec_) : vec(vec_) {}
    void operator() (const tbb::blocked_range &r) const {
        for(size_t i = r.begin(); i < r.end(); ++i) {
            tbb::parallel_sort(vec.begin(), vec.end());
        }
    }
};

void Sort2D(vector &vec) {
    tbb::parallel_for(tbb::blocked_range(0, vec.size()), MySort(vec), tbb::auto_partitioner() );
}

[/cpp]