Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.

A optimization problem on atomic<T>

azru0512
Beginner
1,713 Views

The following is a producer/consumer example.

atomic in, out;
in= 0;
out= 0;

P1P2

while ( (in + 1) %capacity== out);while ( in == out ) ;

/*do something*/ /* do something */

in = (in + 1) % capacity; out = (out + 1) % capacity;

As you can see, in (out) only modified by P1 (P2).

I am wondering if there is any possible benefit in which we replace in (out) with an ordinary variable. So that the above code become,

P1 P2

const size_t local_in = (in + 1) % capacity;const size_t local_out = out;
while (local_in == out); while ( in ==local_out ) ;

/* do something */ /* do something */

in = local_in; out = (local_out + 1) % capacity;

Any comment will be appreciate.

0 Kudos
35 Replies
Dmitry_Vyukov
Valued Contributor I
1,168 Views
Quoting azru0512

I am wondering if there is any possible benefit in which we replace in (out) with an ordinary variable.

Well, for example, if you want to perform an act of sabotage, then there is a benefit of producing subtly broken program people may spent months fixing it. Anything else?.. I do not see.

If you care about performance your atomics library must be flexible enough to not penalize you for no reason. I.e. produce optimal machine code so that any more "optimal" code is broken.

0 Kudos
Dmitry_Vyukov
Valued Contributor I
1,168 Views
If you care about performance your atomics library must be flexible enough to not penalize you for no reason. I.e. produce optimal machine code so that any more "optimal" code is broken.

What you are asking is basically: I need to write some data to file, but it is slow, is there any benefit in not writing it to file? Sorry, if you need to write some data file, you need to write it to file, there is not way to get around it.

The same for atomics. There are not to penalize you, they are to provide functionality your program *requires*.

0 Kudos
azru0512
Beginner
1,168 Views
So you mean just stick to atomic, don't think about what I am doing in the example above?
0 Kudos
Dmitry_Vyukov
Valued Contributor I
1,168 Views
Quoting azru0512
So you mean just stick to atomic

If you are doing muti-threaded programming you need to stick to some synchronization primitives that will provide synchronization your program requires. Atomics are one of them. Plain C/C++ variable are NOT one of them.

0 Kudos
azru0512
Beginner
1,168 Views
So the first one is OK, the second one could be dangerous? I mean the above example.
0 Kudos
Dmitry_Vyukov
Valued Contributor I
1,168 Views
Quoting azru0512
So the first one is OK, the second one could be dangerous? I mean the above example.

Yes. The most innocuous (from the point of view of run-time behavior and time spent localizing the problem) outcome for the second program is that it will instantly hang.

0 Kudos
azru0512
Beginner
1,168 Views

Could you explain how come it will hang? I cannot figure out how this could happen. Thanks.

0 Kudos
Dmitry_Vyukov
Valued Contributor I
1,168 Views
Quoting azru0512

Could you explain how come it will hang? I cannot figure out how this could happen. Thanks.

It will hang if threads will not reload 'in' and 'out' during spinning.

0 Kudos
azru0512
Beginner
1,168 Views
while ( local_in == out) ;

In the statement above, spinning only occurs when thread not reload "out". But how this could happen?Declaring variable "out"as atomic does not help?
0 Kudos
Dmitry_Vyukov
Valued Contributor I
1,168 Views
> In the statement above, spinning only occurs when thread not reload "out".

No, in the statement above spinning occurs while local_in != out.

For it to be correct, thread must reload 'out' from memory on every iteration. Otherwise (out is cached in a register) it can hang.

> But how this could happen?

This happens by default. You should ask: what I must do to to prevent it?

Declaring 'out' as atomic prevents this. All sane atomics implementations explicitly or implicitly guarantee eventual visibility of changes.


0 Kudos
azru0512
Beginner
1,168 Views

I repost the example code here.

atomic in, out;

in = 0; out = 0;


P1 P2

const size_t local_in = (in + 1) % capacity; const size_t local_out = out;

while ( local_in == out )while ( in == local_out )
; ;

/* do something */ /* do something */

in = local_in; out = (local_out + 1) % capacity;

The statement "while ( local_in == out ) ;" should be spining when "local_in == out" if I don't misunderstand what "while" statement means. Besides, maybe you forget that I already declared out as an atomic variable.

Sois above program still problematic? Thanks.

0 Kudos
Dmitry_Vyukov
Valued Contributor I
1,168 Views
Quoting azru0512

Sois above program still problematic?

As far as I see, it is Ok.

0 Kudos
azru0512
Beginner
1,168 Views
Thanks.

And I still want to ask that this kind of usage (i.e., const size_t local_in = in) has any impact on performance?

Or just as you said, stick to atomic is just fine. In other word,forget about "const size_t local_in = in", just use "in" all the way.
0 Kudos
Dmitry_Vyukov
Valued Contributor I
1,168 Views
Quoting azru0512
Thanks.

And I still want to ask that this kind of usage (i.e., const size_t local_in = in) has any impact on performance?

Or just as you said, stick to atomic is just fine. In other word,forget about "const size_t local_in = in", just use "in" all the way.

Ah, I see, I guess we have some misunderstanding on the meaning of your original statement "I am wondering if there is any possible benefit in which we replace in (out) with an ordinary variable".

I interpret it as you want to replace:

atomic in, out;

with

size_t in, out;

It is NOT Ok.

And it seems that you want to replace:

while ( in == out );

with:

const size_t local_out = out;

while ( in == local_out );

(right?)

It is indeed Ok.

Regarding performance benefit of such replacement, I think there can be some benefit. Read of atomic variable will most likely result in load from memory subsystem, while local variable will most likely be cached in a register.

0 Kudos
azru0512
Beginner
1,168 Views
Yes. Finally we understand each other. : )

And I still have aquestion with atomic. Consider the following producer/consumer example,

atomic in, out;
in = 0;
out = 0;

Producer Consumer

voidproduce(char *data) { void consume(char *data) {
while ( (in + 1) % capacity == out) ; while (in == out) ;

buffer[in] = data; data = buffer[out];

in =(in + 1) % capacity; out = (out + 1) % capacity;
} }

As far as I know, most code pattern uses read-acquire/store-releaselike lock/unlock pair. For example,

read-acquire

memory operations

store-release

Above pattern work like a critical section, it prevent memory operations between read-acquire/store-release frominside out.

But in the above producer/consumer example, we see there are some read-acquire in between (e.g., buffer[in] = data;).

So my question is:

Istheabove example OK? And what constraints or rules we should apply on those memory operationsbetween read-acquire/store-release pair?
0 Kudos
Dmitry_Vyukov
Valued Contributor I
1,168 Views
How 'data' is declared?
Do you mean TBB atomics here? Frankly I do not remember their exact semantics, and always had problems understanding code that uses them - laconism and implicitness are the last things I want to see in atomics library. I prefer to stick to C1x atomics lately - they are more flexible and explicit.
Is value evaluation of tbb::atomic load-acquire? It store to tbb::atomic store-release?
0 Kudos
azru0512
Beginner
1,168 Views
Yes, I mean TBB atomic here. TBB atomic associates read/write (load/store)with acquire/release fence. I have checked TBB online reference.

Whatthe main differences between TBB atomics and C1x atomics?As far as I can tell, C1x atomics have default fullfence which garanteesequentially consistent.

And how "data" is declared has anyinfluence on the correctness?
0 Kudos
Dmitry_Vyukov
Valued Contributor I
1,168 Views
> Whatthe main differences between TBB atomics and C1x atomics?

They are more flexiable and explicit.

> As far as I can tell, C1x atomics have default fullfence which garanteesequentially consistent.

I prefer to use _explicit form, that requires explicit indication of memory ordering.

> And how "data" is declared has anyinfluence on the correctness?

But of course. It is 'data' that synchronizes data transfer between threads.


0 Kudos
RafSchietekat
Valued Contributor III
1,168 Views

"But of course. It is 'data' that synchronizes data transfer between threads."
Would you like to "rephrase" that? :-)

0 Kudos
azru0512
Beginner
1,105 Views
But of course. It is 'data' that synchronizes data transfer between threads.

Producer and consumer should operate on different buffer slots, right? Producer inserts data into buffer[1], and consumer retrieves data from buffer[2], for example.

Then, whatyou mean exactly about "how 'data' is declared"?
0 Kudos
Reply