Software Archive
Read-only legacy content
Announcements
FPGA community forums and blogs have moved to the Altera Community. Existing Intel Community members can sign in with their current credentials.
17060 Discussions

By using TBB, speed(performance) is decreased. why ?

lovebestintel
Beginner
577 Views
For multicore programming by TBB, i tested the following code under the this environment (TBB 2.1, Intel C++ Compiler 11.x, Visual Studio 2008, Intel Core 2 Quad 8200, Windows XP)

[CODE C]
#include
#include tbb/parallel_for.h
#include tbb/blocked_range2d.h
#include tbb/task_scheduler_init.h
#include tbb/tick_count.h
#include tbb/partitioner.h

using namespace tbb;

const size_t L = 200;
const size_t M = 200;
const size_t N = 200;

void SerialMatrtixMultiply(float c[], float a[], float b[]){
for(size_t i = 0; i < M; ++i){
for(size_t j = 0; j < N; ++j){
float sum = 0;
for(size_t k = 0; k < L; ++k)
sum += a * b;
c = sum;
}
}
}

class MatrixMultiply2D{
float (*my_a);
float (*my_b);
float (*my_c);

public:
void operator()(const blocked_range2d& r) const {
float (*a) = my_a;
float (*b) = my_b;
float (*c) = my_c;

for(size_t i = r.rows().begin(); i != r.rows().end(); ++i){
for(size_t j = r.cols().begin(); j != r.cols().end(); ++j){
float sum = 0;
for(size_t k = 0; k < L; ++k)
sum += a * b;
c = sum;
}
}
}
MatrixMultiply2D(float c[], float a[], float b[]):my_a(a), my_b(b), my_c(c)
{}
};

void ParallelMatrixMultiply(float c[], float a[], float b[]){
parallel_for(blocked_range2d(0, M, 0, N), MatrixMultiply2D(c,a,b), auto_partitioner());
}

int main(void){
task_scheduler_init init;

float a;
float b;
float c;

srand(time(NULL));
for(int i = 0;i < M;i++){
for(int j = 0;j < L;j++)
a = rand() % 30;
}

for(int i = 0;i < L;i++){
for(int j = 0;j < N;j++)
b = rand() % 30;
}

tick_count t0 = tick_count::now();
SerialMatrtixMultiply(c,a,b);
tick_count t1 = tick_count::now();
std::cout << seq eslaped time : << (t1 t0).seconds() << std::endl;

t0 = tick_count::now();
ParallelMatrixMultiply(c,a,b);
t1 = tick_count::now();
std::cout << parallel eslaped time : << (t1 t0).seconds() << std::endl;


return 0;
}

The elapsed time :

seq elapsed time : 0.04437542 (s)
parallel elapsed time : 0.0111000 (s)

Parallel case is faster than sequential case.

But, when i removed "task_scheduler_init init;" and "ParallelMatrixMultiply(c,a,b);" , namely only running SerialMatrtixMultiply(c,a,b), elapsed time is 0.0000636 (s).

This is very much faster than TBB case.

Why appears this strange?

Any answer will be appricated!

0 Kudos
1 Reply
timintel
Beginner
577 Views
I guess the compiler has been able to eliminate the loops which produce unused results.
0 Kudos
Reply