Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.
Announcements
Welcome to the Intel Community. If you get an answer you like, please mark it as an Accepted Solution to help others. Thank you!

Parallel for giving inconsistent results

kumar__hemant
Beginner
131 Views

I am using parallel for to measure the performance gain relative to simple for loop version , but, I get correct result only when I use simple partitioner with grainsize 1 but it takes double time.

When I don't explicitly provide any partitioner and grainsize , it gives me correct expected value of count till n = 70 ,beyond that , it starts giving random values across different runs. I tried with removing inner loop as well , but that didn't help either. Can anyone tell me what am I missing here?

#include "tbb/tbb.h"
#include <iostream> 
#include <string>
//#include <chrono>
#include <sstream>
#include <ctime>
#include <atomic>
#include <utility>

using namespace tbb;
using namespace std; 

std::atomic<int> count(0);

void foo(const tbb::blocked_range<int>& range ){
    for (int i = 0 ; i < 10000; ++i)
   {
        string l_czTempStr;
        std::ostringstream oss;
        oss << "Test data1";
        oss << "Test data2";
        oss << "Test data3";
        l_czTempStr = oss.str();
        ::count++;
      // ::count.fetch_add(1,memory_order_release);
   }
}


int main() 

    cout <<"hello" <<std::endl;
    int n = 1000;
    clock_t  tStart = clock(); //clock start time 
     tick_count t0 = tick_count::now();
    for(int j=1;j<=n;j++) {
        ::count = 0;
     tick_count t2 = tick_count::now();
        tbb::parallel_for(tbb::blocked_range<int>(0,n,j), [&](const tbb::blocked_range<int>& range){
        foo(range);

    },tbb::simple_partitioner());
        tick_count t3 = tick_count::now();
     cout<< "grainsize: "<< j << " count:" <<::count << " time: "<< (t3-t2).seconds() <<endl;
    }
    cout << "gs done" <<endl;
//  parallel_for<size_t>( 1, 10, 1, foo );


   tick_count t1 = tick_count::now();
    printf("work took %g seconds\n",(t1-t0).seconds());
    cout<<(double)(clock() - tStart)/CLOCKS_PER_SEC*1000<<endl; //wall time total
    cout << "count - " << ::count <<endl;
    cout << "is lock free - " << ::count.is_lock_free() <<endl;

    return 0; 

 

0 Kudos
2 Replies
kumar__hemant
Beginner
131 Views

Can please anyone help me with this?

Vladimir_P_Intel2
131 Views

hello,

you do not know how many times foo() is called because of dynamic balancing. So for N calls of foo() you should get count=N*10000.

usually if you need the same result you need to use blocked range and not just declare it. Instead of 

void foo(const tbb::blocked_range<int>& range ){
    for (int i = 0 ; i < 10000; ++i)
   {
    ...
   }

try

void foo(const tbb::blocked_range<int>& range ){
    for (int i = range.begin() ; i < range.end(); ++i)
   {
   ...
   }

Vladimir

Reply