Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Intel Community
- Software
- Software Development SDKs and Libraries
- Intel® oneAPI Threading Building Blocks
- parallel_reduce missing join ?

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

azmodai

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

05-14-2012
04:52 AM

95 Views

parallel_reduce missing join ?

I'm currently working on a parallel marching cubes. Everything works nicely except sometimes when I am using a certain grainsize to compute the mesh (I am using a 46cores machine). I've put some std::cout in my join method to be able to undertand why I got badly built meshes when using certain grainsizes.

The size of the data to split is 64, with a grainsize of 32 and 16 I've got no problem (using reprectively 2 and 4 cores). When I am using a grainsize of 8 strange problems occur, indeed, the join from 0 to 8 is missed !

Here is what I got for a grainsize of 8 :

Join : 48 / 64 nrml/vertexNbr : 1685 / 1757(3442) trgNumber : 3332 / 3050(6382)

Join : 32 / 48 nrml/vertexNbr : 1003 / 1424(2427) trgNumber : 2052 / 2842(4894)

Join : 16 / 32 nrml/vertexNbr : 1670 / 1163(2833) trgNumber : 3300 / 2260(5560)

Join : 32 / 64 nrml/vertexNbr : 2427 / 3442(5869) trgNumber : 4894 / 6382(11276)

Join : 8 / 32 nrml/vertexNbr : 946 / 2833(3779) trgNumber : 2016 / 5560(7576)

Join : 8 / 64 nrml/vertexNbr : 3779 / 5869(9648) trgNumber : 7576 / 11276(18852)

As you can see there are no join from 0 to 8 !

What I got with a grainsize of 16 :

Join : 0 / 32 nrml/vertexNbr : 946 / 2833(3779) trgNumber : 2016 / 5560(7576)

Join : 32 / 64 nrml/vertexNbr : 2427 / 3442(5869) trgNumber : 4894 / 6382(11276)

Join : 0 / 64 nrml/vertexNbr : 3779 / 5869(9648) trgNumber : 7576 / 11276(1885

Have you ever experienced something similar ? If so what did you do to fix such a behavior ?

Thanks a lot !

Link Copied

7 Replies

RafSchietekat

Black Belt

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

05-14-2012
11:50 AM

95 Views

As effective granularity gets coarser, the opportunity for this optimisation dimishes as the likelihood for adjacent chunks to be executed by different threads increases.

Correct the Body's operator() to aggregate onto the current state.

RafSchietekat

Black Belt

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

05-16-2012
01:06 AM

95 Views

azmodai

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

05-22-2012
06:29 AM

95 Views

azmodai

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

05-24-2012
09:15 AM

95 Views

Here is the code of my join method :

void join(const pMarchingCubes &smc)

{

std::cout << "####################" << std::endl;

std::cout << "####### JOIN #######" << std::endl;

std::cout << "kfirst : " << kfirst << " smc.kfirst : " << smc.kfirst << std::endl;

std::cout << "klast : " << klast << " smc.klast : " << smc.klast << std::endl;

std::cout << "offset : " << offset << " smc.offset : " << smc.offset << std::endl;

std::cout << "triangleNumber : " << triangleNumber << " smc.triangleNumber : " << smc.triangleNumber << std::endl;

std::cout << "vertexNumber : " << vertexNumber << " smc.vertexNumber : " << smc.vertexNumber << std::endl;

std::cout << "####################" << std::endl;

for(int i = 0 ; i < smc.vertexNumber ; i++)

{

// interesting lines 1 & 2

vertices[i+vertexNumber+offset] = vertices[i+smc.offset];

normals[i+vertexNumber+offset] = normals[i+smc.offset];

}

//! get the right indices of triangles crossing the border

for(int i = trianglesToCheck+offset ; i < triangleNumber+offset ; i++)

{

if(triangles

if(triangles

if(triangles

}

for(int i = 0 ; i < smc.triangleNumber ; i++)

{

triangles[i+smc.offset].v[0] += vertexNumber;

triangles[i+smc.offset].v[1] += vertexNumber;

triangles[i+smc.offset].v[2] += vertexNumber;

// interseting line 3

triangles[i+triangleNumber+offset] = triangles[i+smc.offset];

}

triangleNumber += smc.triangleNumber;

vertexNumber += smc.vertexNumber;

}

"kfirst" equal to r.begin() and "klast" to r.end()

"offset" permits to know where to get the triangles (or the vertices, or the normals).

Indeed I'm using one big array to store all the computed triangles, one big array to store all the computed vertices and one big array to store all the computed normals in order not to allocate any memory in the join method() because it would slow down a lot everything.

The three interesting lines are the ones commented as following : "intersing line X".

I am using an "offset" to be sure I do not overwrite a triangle (or a vertex, or a normal) computed in one thread by other triangles computed by other(s) threads. That's why I take care of moving the vertices the normals and the triangles so that there is no more "holes" between values.

I've got these constraints :

--------------------------------

- Because each triangle contains the indices of there vertices, I need the vertices to be stored in the right order.

- Furthermore I've got dependencies between triangles calculated in on chunk related to the neighbor chunk (some triangles are "crossing the border" of the chunks).

Results so far :

-------------------

(I am running this program with 2cores)

I've got no problem when running the program with 2threads, but with more then 2threads data are kind of missing ....

The output I've got with 2threads :

####################

####### JOIN #######

kfirst : 0 smc.kfirst : 32

klast : 32 smc.klast : 64

offset : 0 smc.offset : 73728

triangleNumber : 7576 smc.triangleNumber : 11276

vertexNumber : 3779 smc.vertexNumber : 5869

####################

triangles = 18852

vertices / normals = 9648

average duration = 0.00638916

Bad trgs : 0

With 4threads, I've got :

kfirst : 16 smc.kfirst : 32

klast : 32 smc.klast : 64

A chunk is missing ... the chunk from 0 to 16.

I'm sure this is very easy to solve but I didn't find yet because I was coding using 2threads so far now (I need to do something special in the split const maybe ?)

RafSchietekat

Black Belt

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

05-24-2012
11:22 PM

95 Views

azmodai

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

05-25-2012
12:44 AM

95 Views

Oh ok, indeed. I am going to correct this... Thanks.

Btw it explains the split constructor.

Btw it explains the split constructor.

azmodai

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

05-25-2012
05:09 AM

95 Views

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

For more complete information about compiler optimizations, see our Optimization Notice.