Help for int_sin.c with TBB

Akio_Yasu__Intel_ · ‎02-17-2009

Hello,

I am trying to change the sample code "int_sin.c" using TBB, but it is not going well.
I gets the wrong answer as output which should be 4 but isn't.
I attach the source code on this issue.
I really appreciate if you could check it and correct it.

Thanks,
Akio

Alexey-Kukanov · ‎02-18-2009

You did not initialize IntSin::step in the splitting constructor. I guess this is the reason of the problem.

Akio_Yasu__Intel_ · ‎02-18-2009

Quoting - Alexey Kukanov (Intel)

You did not initialize IntSin::step in the splitting constructor. I guess this is the reason of the problem.

Hello Alexey,

Thanks for your advice.
But I am very beginner in C++ and leaning TBB with Intel TBB book from O'reilly.
It seems taking a time for me to get the quick answer to find out how to initialize "IntSin::step".
I really appreciate if you simply give me the answer correcting the source code I attached.
I am so sorry bothering you in this case.

Best,
Akio

robert-reed · ‎02-18-2009

Quoting - Akio Yasu (Intel)

Thanks for your advice.
But I am very beginner in C++ and leaning TBB with Intel TBB book from O'reilly.
It seems taking a time for me to get the quick answer to find out how to initialize "IntSin::step".

Take a look at the TBB Tutorial, the advanced example of parallel reduce. Note the splitting constructor:

[cpp]MinIndexFoo( MinIndexFoo& x, split ) :
   my_a(x.my_a),
   value_of_min(FLT_MAX), // FLT_MAX from 
   index_of_min(-1)
{}[/cpp]

This is another case where the splitting constructor needs to initialize more than one thing. See how my_a is being mapped from the functor object labeled x to the new object. You could do the same thing with step.

Akio_Yasu__Intel_ · ‎02-18-2009

Hi Robert and Alexey

I think I could correct the problem by looking into the initialization where both of you indicated.
Thank you so much, I could progress one step.

Here is another problem about performance.
I can not get the better performace comparing to the one written with OpenMP on the same code.

My command line operation is like below:
> icl int_sin_tbb.c tbb.lib /MD
> int_sin_tbb.exe
Application Clocks = 2.875000e+003

> icl int_sin_omp.c /Qopenmp
> int_sin_omp.exe
Application Clocks = 9.210000e+002

the OpenMP is around three times faster than TBB.
I am using 11.0.072 Intel C++ Compiler and there are vectrized and parallelized messages from the compiler for OpenMP build.

I attach the both samples, please take a look at them and let me know any possible reason on this performance difference.

Thank you,
Akio

Alexey-Kukanov · ‎02-19-2009

Take a look at the blog entry about a similar problem I wrote before:
http://software.intel.com/en-us/blogs/2008/03/04/why-a-simple-test-can-get-parallel-slowdown/

I guess you might find some answers there.

Akio_Yasu__Intel_ · ‎02-19-2009

Quoting - Alexey Kukanov (Intel)

Take a look at the blog entry about a similar problem I wrote before:
http://software.intel.com/en-us/blogs/2008/03/04/why-a-simple-test-can-get-parallel-slowdown/

I guess you might find some answers there.

Hello Alexey,

I have read your article and tried to change the code using local variable in operator() as below but it did not help.

class IntSin {
const double step;
public:
double sum;
void operator()( const blocked_range& r ) {
double x_i;
double local_sum=0;
double step = IntSin::step;
for( size_t i=r.begin(); i!=r.end(); ++i ) {
x_i = i * step;
local_sum += INTEG_FUNC(x_i) * step;
}
sum += local_sum;
}
// IntSin (IntSin& x, split) : x_i(0), step(x.step), sum(0) {}
IntSin (IntSin& x, split) : step(x.step), sum(0) {}
void join( const IntSin& y) {sum+=y.sum;}
IntSin (double _step) : step(_step), sum(0) {}
};

Do you have any idea or insight why the code does not run faster?
I appreciate your help.

Regards,
Akio

Alexey-Kukanov · ‎02-24-2009

I have experimented with your code, and found that the biggest performance impact is due to the use of /MD option required by TBB. For some unknown reason (which I would call a bug in Intel Compiler's math library), just switching from /MT to /MD slowed down your test three times, no matter whether TBB was used, or OpenMP, or no threading at all.