Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.

Parallel for giving inconsistent results

kumar__hemant
Beginner
473 Views

I am using parallel for to measure the performance gain relative to simple for loop version , but, I get correct result only when I use simple partitioner with grainsize 1 but it takes double time.

When I don't explicitly provide any partitioner and grainsize , it gives me correct expected value of count till n = 70 ,beyond that , it starts giving random values across different runs. I tried with removing inner loop as well , but that didn't help either. Can anyone tell me what am I missing here?

#include "tbb/tbb.h"
#include <iostream> 
#include <string>
//#include <chrono>
#include <sstream>
#include <ctime>
#include <atomic>
#include <utility>

using namespace tbb;
using namespace std; 

std::atomic<int> count(0);

void foo(const tbb::blocked_range<int>& range ){
    for (int i = 0 ; i < 10000; ++i)
   {
        string l_czTempStr;
        std::ostringstream oss;
        oss << "Test data1";
        oss << "Test data2";
        oss << "Test data3";
        l_czTempStr = oss.str();
        ::count++;
      // ::count.fetch_add(1,memory_order_release);
   }
}


int main() 

    cout <<"hello" <<std::endl;
    int n = 1000;
    clock_t  tStart = clock(); //clock start time 
     tick_count t0 = tick_count::now();
    for(int j=1;j<=n;j++) {
        ::count = 0;
     tick_count t2 = tick_count::now();
        tbb::parallel_for(tbb::blocked_range<int>(0,n,j), [&](const tbb::blocked_range<int>& range){
        foo(range);

    },tbb::simple_partitioner());
        tick_count t3 = tick_count::now();
     cout<< "grainsize: "<< j << " count:" <<::count << " time: "<< (t3-t2).seconds() <<endl;
    }
    cout << "gs done" <<endl;
//  parallel_for<size_t>( 1, 10, 1, foo );


   tick_count t1 = tick_count::now();
    printf("work took %g seconds\n",(t1-t0).seconds());
    cout<<(double)(clock() - tStart)/CLOCKS_PER_SEC*1000<<endl; //wall time total
    cout << "count - " << ::count <<endl;
    cout << "is lock free - " << ::count.is_lock_free() <<endl;

    return 0; 

 

0 Kudos
2 Replies
kumar__hemant
Beginner
473 Views

Can please anyone help me with this?

0 Kudos
Vladimir_P_1234567890
473 Views

hello,

you do not know how many times foo() is called because of dynamic balancing. So for N calls of foo() you should get count=N*10000.

usually if you need the same result you need to use blocked range and not just declare it. Instead of 

void foo(const tbb::blocked_range<int>& range ){
    for (int i = 0 ; i < 10000; ++i)
   {
    ...
   }

try

void foo(const tbb::blocked_range<int>& range ){
    for (int i = range.begin() ; i < range.end(); ++i)
   {
   ...
   }

Vladimir

0 Kudos
Reply