- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello everyone!
When i use paralle_for parallel the program,then i use inspector detected there has data race(read and write race).But i cannot find a detailed methods that can used eliminate data race.Who can give me a suggestion.Thank you very much!
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Frank!
It's hard to help you with that little info. It's better if you post the code you have and maybe a screenshot of the Inspector when the race is detected.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Read: http://en.wikipedia.org/wiki/Race_condition
to give you an overview of race conditions.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Adri C. S. wrote:
Hi Frank!
It's hard to help you with that little info. It's better if you post the code you have and maybe a screenshot of the Inspector when the race is detected.
It is read and write data race.Just like the picture show,the parameter jy appears read and write race.What can i do? If atomic ok? Or use mutexes or lock?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The question we need you to answer is:
a) Do all worker threads use the same [iter->number] range?
or
b) Does each worker have its own (different) [iter->number] range?
In the event of a) then consider using a reducer (each worker accumulates += values independently, which are then reduced once later).
In the event of b) then you only have an issue between one worker thread issuing first [iter->number] against adjacent worker thread issuing last [iter->number]. In this case you only require a mutex (or reducer) at the first and last iter->number's
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
jimdempseyatthecove wrote:
The question we need you to answer is:
a) Do all worker threads use the same [iter->number] range?
or
b) Does each worker have its own (different) [iter->number] range?In the event of a) then consider using a reducer (each worker accumulates += values independently, which are then reduced once later).
In the event of b) then you only have an issue between one worker thread issuing first [iter->number] against adjacent worker thread issuing last [iter->number]. In this case you only require a mutex (or reducer) at the first and last iter->number'sJim Dempsey
Can you give me some information about the case a:using a reducer,I am a novice.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You can look at the TBB parallel_reduce examples or you might want to consult a recursion example such as: C:\Program Files (x86)\Intel\Composer XE 2011 SP1\tbb\examples\task\tree_sum
(or wherever you installed the samples on Linux.
or (untested code)
double sum(double A[], size_t n) { if(n < MIN_N_FOR_SUM) { double ret = 0.0; for(int i=0; i<n; ++i) ret += A; return ret; } else { size_t n1 = (n / 2) - ((n / 2) % (64 / sizeof(double))); size_t n2 = n - n1; double sum1, sum2; parallel_invoke( []() { sum1 = sum(A, n1); }, []() { sum2 = sum(A+n1, n2); }); return sum1 + sum2; } }
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
jimdempseyatthecove wrote:
You can look at the TBB parallel_reduce examples or you might want to consult a recursion example such as: C:\Program Files (x86)\Intel\Composer XE 2011 SP1\tbb\examples\task\tree_sum
(or wherever you installed the samples on Linux.
or (untested code)
double sum(double A[], size_t n) { if(n < MIN_N_FOR_SUM) { double ret = 0.0; for(int i=0; i<n; ++i) ret += A; return ret; } else { size_t n1 = (n / 2) - ((n / 2) % (64 / sizeof(double))); size_t n2 = n - n1; double sum1, sum2; parallel_invoke( []() { sum1 = sum(A, n1); }, []() { sum2 = sum(A+n1, n2); }); return sum1 + sum2; } }Jim Dempsey
Thank you very much! my dear friend.My code like this,and i want it be parallel_for,then the problem had arisen.
for( iter=cell_n+3; iter!=iter_end-3; iter++ )
{
iter->fp = (iter-1)->fp_old1 - pidt1 * (iter-1)->jy;
iter->gm = (iter-1)->gm_old1 - pidt1 * (iter-1)->jz;
iter->fm = (iter+1)->fm - pidt1 * iter->jy;
iter->gp = (iter+1)->gp - pidt1 * iter->jz;
iter->ey = iter->fp + iter->fm; iter->bz = iter->fp - iter->fm;
iter->ez = iter->gp + iter->gm; iter->by = iter->gp - iter->gm;
iter->ex -= 2.0 * pidt1 * iter->jx;
}
in the parallel_for code,each thread accesses the "fm" and "gp" ,int the code 5th and 6th line,there will be any data race.How do I remove this competition?Thank you!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
(Note that Jim's code sample emphasises clarity, not performance.)
At first sight fm and gp are the target of a "scan" operation over jy and jz from right to left, so you would have to move them into a separate loop running in that direction. Use tbb::parallel_scan() for large intervals (one forward for fp and gm, one backward for fm and gp). You could parallel_invoke() those scans and then use tbb::parallel_for to compute e{x,y,z} and b{y,z} (clearest), or run one of the scans first and then everything else in the other scan (tbb::parallel_for() stuff integrated in the final pass of the second scan, which might make better use of the cache).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Raf Schietekat wrote:
(Note that Jim's code sample emphasises clarity, not performance.)
At first sight fm and gp are the target of a "scan" operation over jy and jz from right to left, so you would have to move them into a separate loop running in that direction. Use tbb::parallel_scan() for large intervals (one forward for fp and gm, one backward for fm and gp). You could parallel_invoke() those scans and then use tbb::parallel_for to compute e{x,y,z} and b{y,z} (clearest), or run one of the scans first and then everything else in the other scan (tbb::parallel_for() stuff integrated in the final pass of the second scan, which might make better use of the cache).
Thank you! Could you recommend me some material to learn TBB, some books,some Web ? And also i have a question like
for(iter = part_n,iter_end=part_n+(vector_part.end()-vector_part.begin());iter!=iter_end;iter++) { has_to_change_cell( cell_n, iter ); // put particles on stack } inline void propagate::has_to_change_cell( struct cell *cell, struct particle *part ) { if ( part->x < cell[part->number].x ) { cell[part->number].np[part->species]--; cell[part->number-1].np[part->species]++; cell[part->number].npart--; cell[part->number-1].npart++; part->number--; } else if ( part->x >= cell[part->number].x + dx ) { cell[part->number].np[part->species]--; cell[part->number+1].np[part->species]++; cell[part->number].npart--; cell[part->number+1].npart++; part->number++; } }
Then between the threads ,
"cell[part->number].np[part->species]"will have data race ,what can i do?Thank you very much!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Raf Schietekat wrote:
(Note that Jim's code sample emphasises clarity, not performance.)
At first sight fm and gp are the target of a "scan" operation over jy and jz from right to left, so you would have to move them into a separate loop running in that direction. Use tbb::parallel_scan() for large intervals (one forward for fp and gm, one backward for fm and gp). You could parallel_invoke() those scans and then use tbb::parallel_for to compute e{x,y,z} and b{y,z} (clearest), or run one of the scans first and then everything else in the other scan (tbb::parallel_for() stuff integrated in the final pass of the second scan, which might make better use of the cache).
Thank you! Could you recommend me some material to learn TBB, some books,some Web ? And also i have a question like
for(iter = part_n,iter_end=part_n+(vector_part.end()-vector_part.begin());iter!=iter_end;iter++) { has_to_change_cell( cell_n, iter ); // put particles on stack } inline void propagate::has_to_change_cell( struct cell *cell, struct particle *part ) { if ( part->x < cell[part->number].x ) { cell[part->number].np[part->species]--; cell[part->number-1].np[part->species]++; cell[part->number].npart--; cell[part->number-1].npart++; part->number--; } else if ( part->x >= cell[part->number].x + dx ) { cell[part->number].np[part->species]--; cell[part->number+1].np[part->species]++; cell[part->number].npart--; cell[part->number+1].npart++; part->number++; } }
Then between the threads ,
"cell[part->number].np[part->species]"will have data race ,what can i do?Thank you very much!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
At TBB's own website you will find the online "User Guide and Design Patterns", and a picture of the book "Intel Threading Build Blocks" with a link to a large online shop for books and other stuff. The book might not be up to date anymore, but it's still a good read.
Unless the problem has some relevant structure that I missed, you could make np and npart atomic.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Raf Schietekat wrote:
At TBB's own website you will find the online "User Guide and Design Patterns", and a picture of the book "Intel Threading Build Blocks" with a link to a large online shop for books and other stuff. The book might not be up to date anymore, but it's still a good read.
Unless the problem has some relevant structure that I missed, you could make np and npart atomic.
Thank you! I tried to use atomic operations before,may be i wrote the wrong statement,could you tell me which the statement I can use?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
++ and -- are implemented by tbb::atomic (at least for integers).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
While you can solve this with using atomics, the problem is with performance. I would suggest something along the following:
inline void propagate::has_to_change_cell( struct cell *cell, struct particle *part ) { if ( part->x < cell[part->number].x ) { cell[part->number].np[part->species]--; cell[part->number].np_prior[part->species]++; cell[part->number].npart--; part->number--; } else if ( part->x >= cell[part->number].x + dx ) { cell[part->number].np[part->species]--; cell[part->number].np_following[part->species]++; cell[part->number].npart--; part->number++; } } inline int propagate::get_np( struct cell *cell, struct particle *part ) { return cell[part->number].np[part->species] + cell[part->number-1].np_following[part->species] + cell[part->number+1].np_prior[part->species]; }
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It goes without mentioning that you can run (occasionally) a parallel_for to reconcile (collapse) the np_following and np_prior into np.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I like the idea, but I'm afraid it won't work here, because the -1 and +1 are on "part->number", not on "part".
A simple situation that would fail is if all "number" and "species" values are the same for each *part.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In both the original code and my suggested code there is a requirement that all part->number's are unique. Though, all part->species need not be unique.
My code suggestion removes the +1 and -1 on the cell subscript, and adds two arrays of size of the np array. The tally of the adjacent cells will contain the proper tallies for the changes.
These tallies are what would have been added/subtracted from the +1 and -1 entries (which can be summed when fetching the new counts).
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
jimdempseyatthecove wrote:
In both the original code and my suggested code there is a requirement that all part->number's are unique. Though, all part->species need not be unique.
My code suggestion removes the +1 and -1 on the cell subscript, and adds two arrays of size of the np array. The tally of the adjacent cells will contain the proper tallies for the changes.
These tallies are what would have been added/subtracted from the +1 and -1 entries (which can be summed when fetching the new counts).
Jim Dempsey
Thank you very much! There is the situation like Raf Schietekat says,A simple situation that would fail is if all "number" and "species" values are the same for each *part.In a cell number,there are some particles,the particles'number is same to the cell's number.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
My assumption is:
You have a parallel_for that iterates on part numbers [0:nParts) and passes the part object pointer, such as pParts.
If this is the case, then no two threads will have the same part numbers, and thus, will not have the same indexes of cell.
*** however, you have an issue of the +1 and -1 at the boundaries of the thread subsections.
Note, you can insert into you code a test to see if the index of the current cell is the first or last iteration. In those cases, you can perform an atomic operation (sync_fetch_and_add), otherwise on interior locations simply ++ or -- as you do now
Jim Demspey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
And my interpretation is that "number" means "cell number", with possibly multiple part belonging to the same cell. The iteration starts at part_n, which seems to be a random-access iterator, not an index.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page