Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.

tbb:atomic<> operations on elements of vector (double*)

Handsome_T
Beginner
1,072 Views

Hi everyone,

I am trying to parallelize a subroutine which populates content of a double vector using parallel_for

Due to nature of population, racing condition will certainly occur so I need to somehow guard against that.

The vector is beging defined/allocated as a double v.

What's the correct way of passing this vector to the operator() so I can use tbb::atomic.

I was thinking of doing something like this and wanted to know if this is enough

to make sure that [cpp]my_v += some_new_value;[/cpp] is atomic.

[cpp]

double *v = new double;

tbb:atomic<double *> va = v;

parallel_for(tbb::blocked_range<size_t>(0,N), Foo(va))

[/cpp]

where Foo is defined as:

[cpp]

class Foo {

    double *my_v;

public:

    void operator()(tbb:blocked_range<size_t>& r) const {

        for (size_t i=r.begin(); i<r.end(); ++r) {

            size_t idx = _compute_index(i);

            my_v[idx] += some_new_value;

        }

    }

    Foo(double *in_v) : my_v(in_v) { }

};

[/cpp]

(Line 6 above can produce identical values of 'idx' on different threads so I have to make sure line 7 is atomic.)

thanks in advance for any help/input.

0 Kudos
9 Replies
RafSchietekat
Valued Contributor III
1,072 Views
You should define an array of atomics, not an atomic pointer to a non-atomic array.
0 Kudos
Handsome_T
Beginner
1,072 Views

Raf Schietekat wrote:

You should define an array of atomics, not an atomic pointer to a non-atomic array.

Thanks for your reply Raf. Since the subroutine is supposed to return that array to the calling function in a double result array, how do I convert the array of atomics to a regular array of doubles if I have memory constraints and I can not allocate a new array of size N of doubles? thanks again

0 Kudos
RafSchietekat
Valued Contributor III
1,072 Views
You can always try casting, since atomic types occupy the same space as the underlying type, either the array or individual elements should do. But wait, isn't += unimplemented for atomic double? And if you really have concurrent access, using an array may cost a lot of performance because of false sharing.
0 Kudos
Handsome_T
Beginner
1,072 Views

Raf Schietekat wrote:

But wait, isn't += unimplemented for atomic double?

oh, yes you are right. This parallelization won't be as easy as I thought it would be... maybe scope locking?

0 Kudos
RafSchietekat
Valued Contributor III
1,072 Views
Assuming it's the right thing to do (we only know this aspect), you could roll your own += based on CAS (pun intended).
0 Kudos
Handsome_T
Beginner
1,072 Views
Assuming you are referring to 'compare & swap', rolling my += can be a very good exercise for me. However, since the array in question is from a finite element mesh, I can either:
  • partition the mesh and then build the array for each partition by iterating over elements In that case, the racing condition will only happen for the elements lying on the interface of partitions. I am hoping, I can then handle those elements separately (either by a single thread or using a rather expensive lock mechanism).
  • Or instead of iterating over elements, I do over nodes. The drawback of this approach is that the code will need to do some extra book-keepings and will require more memory.
  • By the way, I tried the spin_lock mutex approach and the performance got a big hit (as expected). Would you mind telling me which files of tbb I should be looking to if I want to do my own += ? thanks again for your help
    0 Kudos
    jimdempseyatthecove
    Honored Contributor III
    1,072 Views
    You have a 3rd method, a little more code, no additional data. Partition the elements (typically called tiling). Identify shared boundaries parallel-ize interior of tile (plus unshared boundaries/perimiter tiles) (join) of shared boundaries, identify cells that intersect (cross points of tile boundaries) parallel-ize non-intersecting portion of boundaries (join) serialize or parallize intersection of boundaries (note, intersection of boundaries non-adjacent unless tile is 3x3 or smaller) Jim Dempsey
    0 Kudos
    Handsome_T
    Beginner
    1,072 Views
    JimDempseyAtTheCove wrote:
    (join)
    of shared boundaries, identify cells that intersect (cross points of tile boundaries)
    parallel-ize non-intersecting portion of boundaries
    (join)
    Jim, thanks for your help. I am not completely following the logic here. Aren't the 'non-intersecting portions of boundaries' already taken care of in the first step when we parallelize interior of partitions.
    JimDempseyAtTheCove wrote:
    (note, intersection of boundaries non-adjacent unless tile is 3x3 or smaller)
    Would you mind elaborating on this one as well? thanks again
    0 Kudos
    jimdempseyatthecove
    Honored Contributor III
    1,072 Views
    >>Aren't the 'non-intersecting portions of boundaries' already taken care of in the first step when we parallelize interior of partitions. When the elements are wholely contained then no intersections. However, when elements interact with adjacent elements (e.g. beam segment ends in Finite Element simulation), then you have a boundary issue with intersections where perimiter cells of one tile interact with (adjacent) perimiter cells of a neighboring cell. When calculations are such that multiple threads could potentially update the same cell (variable) then you need to take proactive action such as critical section, compare and swap loop, or thread scheduling manuvers to avoid issues (or serialization). In 3x3 where cell computations interact, then no interacting computation can be made without interacting with a perimiter cell. Same with 3xn or nx3 or smaller. Example, in Finite Element simulation you may choose to parallel_for slice up the beams, determine the strain, and thus the tension force, and then accumulate the force at the beam's end point objects (connection points). Beams in adjacent slices may (will) share endpoints along the perimiter of the slice (tile). This presents a potential for two threads performing EndPoint(N) += Force; EndPoint(N+1) -= Force; When an end point connects beams from adjacent tiles (slices) then the potential is for two threads performing Read, Add, Store at the same time. This will introduce an error in the calculation. Jim Dempsey
    0 Kudos
    Reply