- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

I'm currently working on a parallel marching cubes. Everything works nicely except sometimes when I am using a certain grainsize to compute the mesh (I am using a 46cores machine). I've put some std::cout in my join method to be able to undertand why I got badly built meshes when using certain grainsizes.

The size of the data to split is 64, with a grainsize of 32 and 16 I've got no problem (using reprectively 2 and 4 cores). When I am using a grainsize of 8 strange problems occur, indeed, the join from 0 to 8 is missed !

Here is what I got for a grainsize of 8 :

Join : 48 / 64 nrml/vertexNbr : 1685 / 1757(3442) trgNumber : 3332 / 3050(6382)

Join : 32 / 48 nrml/vertexNbr : 1003 / 1424(2427) trgNumber : 2052 / 2842(4894)

Join : 16 / 32 nrml/vertexNbr : 1670 / 1163(2833) trgNumber : 3300 / 2260(5560)

Join : 32 / 64 nrml/vertexNbr : 2427 / 3442(5869) trgNumber : 4894 / 6382(11276)

Join : 8 / 32 nrml/vertexNbr : 946 / 2833(3779) trgNumber : 2016 / 5560(7576)

Join : 8 / 64 nrml/vertexNbr : 3779 / 5869(9648) trgNumber : 7576 / 11276(18852)

As you can see there are no join from 0 to 8 !

What I got with a grainsize of 16 :

Join : 0 / 32 nrml/vertexNbr : 946 / 2833(3779) trgNumber : 2016 / 5560(7576)

Join : 32 / 64 nrml/vertexNbr : 2427 / 3442(5869) trgNumber : 4894 / 6382(11276)

Join : 0 / 64 nrml/vertexNbr : 3779 / 5869(9648) trgNumber : 7576 / 11276(1885

Have you ever experienced something similar ? If so what did you do to fix such a behavior ?

Thanks a lot !

Link Copied

7 Replies

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

As effective granularity gets coarser, the opportunity for this optimisation dimishes as the likelihood for adjacent chunks to be executed by different threads increases.

Correct the Body's operator() to aggregate onto the current state.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

*"As effective granularity gets coarser, the opportunity for this optimisation dimishes as the likelihood for adjacent chunks to be executed by different threads increases."*

That sounds worse than I meant it: the optimisation is only relevant to decrease the opportunity cost (right?) for parallelism (in a way specific to this algorithm), so it is less likely to kick in when this cost has already been lowered another way (though hopefully not by removing too much opportunity). And "likelihood" is relative to the number of chunk boundaries, because assumedly it should stay about constant in absolute number until granularity starts getting too coarse.

*"Correct the Body's operator() to aggregate onto the current state."*

Was this helpful?

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Here is the code of my join method :

void join(const pMarchingCubes &smc)

{

std::cout << "####################" << std::endl;

std::cout << "####### JOIN #######" << std::endl;

std::cout << "kfirst : " << kfirst << " smc.kfirst : " << smc.kfirst << std::endl;

std::cout << "klast : " << klast << " smc.klast : " << smc.klast << std::endl;

std::cout << "offset : " << offset << " smc.offset : " << smc.offset << std::endl;

std::cout << "triangleNumber : " << triangleNumber << " smc.triangleNumber : " << smc.triangleNumber << std::endl;

std::cout << "vertexNumber : " << vertexNumber << " smc.vertexNumber : " << smc.vertexNumber << std::endl;

std::cout << "####################" << std::endl;

for(int i = 0 ; i < smc.vertexNumber ; i++)

{

// interesting lines 1 & 2

vertices[i+vertexNumber+offset] = vertices[i+smc.offset];

normals[i+vertexNumber+offset] = normals[i+smc.offset];

}

//! get the right indices of triangles crossing the border

for(int i = trianglesToCheck+offset ; i < triangleNumber+offset ; i++)

{

if(triangles

*.v[0] <= 0) triangles*

*.v[0] = - triangles**.v[0] + vertexNumber;*

if(trianglesif(triangles

*.v[1] <= 0) triangles**.v[1] = - triangles**.v[1] + vertexNumber;*

if(trianglesif(triangles

*.v[2] <= 0) triangles**.v[2] = - triangles**.v[2] + vertexNumber;*

}

for(int i = 0 ; i < smc.triangleNumber ; i++)

{

triangles[i+smc.offset].v[0] += vertexNumber;

triangles[i+smc.offset].v[1] += vertexNumber;

triangles[i+smc.offset].v[2] += vertexNumber;

// interseting line 3

triangles[i+triangleNumber+offset] = triangles[i+smc.offset];

}

triangleNumber += smc.triangleNumber;

vertexNumber += smc.vertexNumber;

}

"kfirst" equal to r.begin() and "klast" to r.end()

"offset" permits to know where to get the triangles (or the vertices, or the normals).

Indeed I'm using one big array to store all the computed triangles, one big array to store all the computed vertices and one big array to store all the computed normals in order not to allocate any memory in the join method() because it would slow down a lot everything.

The three interesting lines are the ones commented as following : "intersing line X".

I am using an "offset" to be sure I do not overwrite a triangle (or a vertex, or a normal) computed in one thread by other triangles computed by other(s) threads. That's why I take care of moving the vertices the normals and the triangles so that there is no more "holes" between values.

I've got these constraints :

--------------------------------

- Because each triangle contains the indices of there vertices, I need the vertices to be stored in the right order.

- Furthermore I've got dependencies between triangles calculated in on chunk related to the neighbor chunk (some triangles are "crossing the border" of the chunks).

Results so far :

-------------------

(I am running this program with 2cores)

I've got no problem when running the program with 2threads, but with more then 2threads data are kind of missing ....

The output I've got with 2threads :

####################

####### JOIN #######

kfirst : 0 smc.kfirst : 32

klast : 32 smc.klast : 64

offset : 0 smc.offset : 73728

triangleNumber : 7576 smc.triangleNumber : 11276

vertexNumber : 3779 smc.vertexNumber : 5869

####################

triangles = 18852

vertices / normals = 9648

average duration = 0.00638916

Bad trgs : 0

With 4threads, I've got :

kfirst : 16 smc.kfirst : 32

klast : 32 smc.klast : 64

A chunk is missing ... the chunk from 0 to 16.

I'm sure this is very easy to solve but I didn't find yet because I was coding using 2threads so far now (I need to do something special in the split const maybe ?)

}

for(int i = 0 ; i < smc.triangleNumber ; i++)

{

triangles[i+smc.offset].v[0] += vertexNumber;

triangles[i+smc.offset].v[1] += vertexNumber;

triangles[i+smc.offset].v[2] += vertexNumber;

// interseting line 3

triangles[i+triangleNumber+offset] = triangles[i+smc.offset];

}

triangleNumber += smc.triangleNumber;

vertexNumber += smc.vertexNumber;

}

"kfirst" equal to r.begin() and "klast" to r.end()

"offset" permits to know where to get the triangles (or the vertices, or the normals).

Indeed I'm using one big array to store all the computed triangles, one big array to store all the computed vertices and one big array to store all the computed normals in order not to allocate any memory in the join method() because it would slow down a lot everything.

The three interesting lines are the ones commented as following : "intersing line X".

I am using an "offset" to be sure I do not overwrite a triangle (or a vertex, or a normal) computed in one thread by other triangles computed by other(s) threads. That's why I take care of moving the vertices the normals and the triangles so that there is no more "holes" between values.

I've got these constraints :

--------------------------------

- Because each triangle contains the indices of there vertices, I need the vertices to be stored in the right order.

- Furthermore I've got dependencies between triangles calculated in on chunk related to the neighbor chunk (some triangles are "crossing the border" of the chunks).

Results so far :

-------------------

(I am running this program with 2cores)

I've got no problem when running the program with 2threads, but with more then 2threads data are kind of missing ....

The output I've got with 2threads :

####################

####### JOIN #######

kfirst : 0 smc.kfirst : 32

klast : 32 smc.klast : 64

offset : 0 smc.offset : 73728

triangleNumber : 7576 smc.triangleNumber : 11276

vertexNumber : 3779 smc.vertexNumber : 5869

####################

triangles = 18852

vertices / normals = 9648

average duration = 0.00638916

Bad trgs : 0

With 4threads, I've got :

kfirst : 16 smc.kfirst : 32

klast : 32 smc.klast : 64

A chunk is missing ... the chunk from 0 to 16.

I'm sure this is very easy to solve but I didn't find yet because I was coding using 2threads so far now (I need to do something special in the split const maybe ?)

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Oh ok, indeed. I am going to correct this... Thanks.

Btw it explains the split constructor.

Btw it explains the split constructor.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page