Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Intel Community
- Software
- Software Development SDKs and Libraries
- Intel® oneAPI Threading Building Blocks
- Nested For Loop: blocked_range 1D or 2D

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

akhal

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

07-01-2011
04:33 AM

78 Views

Nested For Loop: blocked_range 1D or 2D

I am kinda newbie with Intel TBB and trying out parallelizing a problem which worked well with OpenMP but doesnt show speed up with TBB though looping are independent. I thought maybe 2D blocked_range might help, though it shows speedup but wrong results of calculation. My codes are as follows:

/*-----Serial Version-----*/

for(k=0; k{

for(i=k+1; i{

s= s /s ;

for(j=k+1; js -= s *s ;

}

}

/*OpenMP version (which shows considerable speedup) */

#pragma omp parallel default(shared) private(k)

for(k=0; k{

#pragma omp for private(i,j) schedule(static)

for(i=k+1; i{

a= a /a ;

for(j=k+1; ja1 = a1 - a1 *a1 ;

}

}

/* TBB version (1D blocked_range) */

task_scheduler_init TBBinit(nthreads);

for(int k=0; kparallel_for(blocked_range (k, size, (size-k)/nthreads), my_class(a2));

/* setting grainsize to that values reduced time but still its multiple of serial exection time:( */

class my_class

{

double** my_a;

public:

my_class(a[size][size]):my_a(a){}

void operator() (const blocked_range& r) const

{

double** a2 = my_a;

int k = r.begin();

for(int i=k+1; i!=size; i++)

{

a2= a2 /a2 ;

for(j=k+1; j!=size; j++)

a2= a2 - a2 *a2 ;

}

}

}; //This 1-D gives so poor performance

/*----- I tried 2-D range as follows-------*/

for(int k=0; kparallel_for(blocked_range2d (k,size,(size-k)/nthreads,k,size,(size-k)/nthreads), my_class2d(a3));

//Class body

class my_class2d

{

double** my_a;

public:

my_class2d(a[size][size]):my_a(a){}

void operator() (const blocked_range2d& r) const

{

double** a3 = my_a;

int k = r.rows().begin();

int end = r.rows().end(); //or r.cols().end()

for(int i=k+1; i!=end; i++)

{

a3= a3 /a3 ;

for(j=k+1; j!=end; j++)

a3= a3 - a3 *a3 ;

}

}

};

//But this 2D attempt gives wrong results

Is this structure even parallelizable with TBB, if yes then with 1D range or with 2D range, because my 1D range example gives correct results but its too far slow than even serial, and 2D is fast but wrong results. Any help?

Link Copied

1 Reply

Kirill_R_Intel

Employee

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

07-06-2011
03:33 AM

78 Views

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

For more complete information about compiler optimizations, see our Optimization Notice.