Intel® Moderncode for Parallel Architectures
Support for developing parallel programming applications on Intel® Architecture.

Is this operation atomic?

Tudor
New Contributor I
1,415 Views
Hi there,

Haven't visited the forums in a while. :)
I'm running this C# code in a 64-bit environment and I'm curious as to whether the following operation is atomic:

int CURRENT_TASK = -1;
int size1 = 1024;

// spawn several threads with this code:
{
int i = 0;
while( (i = Interlocked.Increment(ref CURRENT_TASK) < size1) )
{
// do whatever
}
}

So I figured that I have a comparison between two int values (i and size1), the lefthand side also contains two operations: the assignment which is atomic and the atomic increment). My question is: is this while condition atomic? Or can another thread interrupt it and increment CURRENT_TASK before the assignment to i is complete?
Note that i is local to each thread and size1 is only read, never modified.
Thanks in advance.
0 Kudos
9 Replies
Dmitry_Vyukov
Valued Contributor I
1,415 Views
> Or can another thread interrupt it and increment CURRENT_TASK before the assignment to i is complete?

Of course it can. But for what you need such atomicity?

0 Kudos
Tudor
New Contributor I
1,415 Views
I need it in order to simulate a task queue where each thread gets a unique iteration from a for loop that goes from 0 to size1.
How about:

Interlocked.Exchange(ref i, Interlocked.Increment(ref CURRENT_TASK1));
while(i < size1)
{
//code
}

And I still don't understand why my previous implementation is not atomic...
The only global variable modified is CURRENT_TASK via Interlocked.Increment. The result of this call is local to the calling thread. Even if a thread intervenes before "i = result of call" is executed, the result of the increment is local to the thread that called the method, so it can't possibly find a different value there when it gets back on the cpu.
0 Kudos
Dmitry_Vyukov
Valued Contributor I
1,415 Views
You will get what you want with you initial code. You do not need increment of CURRENT_TASK and assignment to i be atomic for that. Increment of CURRENT_TASK is atomic and that's enough.
0 Kudos
jimdempseyatthecove
Honored Contributor III
1,415 Views
Your initial post was correct

Your second post is meaningless (unless your intent is obfuscation)

Interlocked.Exchange(ref i, Interlocked.Increment(ref CURRENT_TASK1));
while(i < size1)


What the above does, is perform an atomic increment of CURRENT_TASK1 atomicaly returning the incremented value in the return register (eax or rax).

Then you are performing an interlocked exchange of the returned value with a thread localcopy of i (i on stack). The value held in CURRENT_TASK1 may have altered between the time of the atomic increment and the usless atomic exchange, but that is okay because you are only interested in pecking number that you obtained at the interlocked increment

for(
i=Interlocked.Increment(ref CURRENT_TASK1);
i < size1;
i=Interlocked.Increment(ref CURRENT_TASK1); )
{
do something
}

or

while(( i=Interlocked.Increment(ref CURRENT_TASK1)) < size1)
{
do something
}


Jim
0 Kudos
Tudor
New Contributor I
1,415 Views
Thank you for your replies. It's all clear now.
I guess my initial question should have been "Will this operation create a data race?" instead of "Is this operation atomic?".
0 Kudos
Grant_H_Intel
Employee
1,415 Views
Technically, CURRENT_TASK should be declared "volatile int" because you are spinning on a value which may be updated by a different thread. But the answer to your question is that the original code should not have a data race. The value of i should be set according to the result of the atomic increment.
0 Kudos
Tudor
New Contributor I
1,415 Views
But won't the atomic increment create a memory barrier anyway?
0 Kudos
jimdempseyatthecove
Honored Contributor III
1,415 Views
>>I guess my initial question should have been "Will this operation create a data race?" instead of "Is this operation atomic?".

To simply answer that question with "No" would imply that you code was well written - which it was not.
The Interlocked.xxx form of instruction is very expensive in terms of time to complete execution 100x to 300xlonger than non-interlocked form of same instruction. While use of an Interlocked instruction where it is not requiredwill not create a data race contition, it will introduce an unnecessary performance peanalty.

Jim Dempsey
0 Kudos
jimdempseyatthecove
Honored Contributor III
1,415 Views
>>But won't the atomic increment create a memory barrier anyway?

Yes and no.

Yes for the variable undergoing atomic increment
No(or at least not necessarily yes) for other variables.

There are several types of memory barriers:

One of these barriers is a compiler memory barrier. This barrier has nothing to do with the physical state of memory, rather it affects assumptions that the compiler is permitted to make relating to memory state. When using optimization, unless told otherwise, a compiler is free to registerize variables. This means when the compiler generates code to reads a memory location it is free to place the contenst of the memory location into aregister, thenit can generate code to continue to re-use the value in the register for subsequent re-use of the variable. What the compiler cannot know is if the same memory location has been modified outside of the code stream it has generated. This can happen on multi-threaded programs or on serial programs referencing memory shared by an I/O device. For these situations you need to instruct the compiler that the memory location(s) is(are) stale and that a direct re-read of the memory location is desired.

There are similar softwarebarriers for writes of registered variables where you need to ensure the physical memory location is written by the time you reach a specific point in the application. Otherwise a different thread observing the same memory location (or I/O device) will not observe the change or not observe the change in the proper sequence of changes.

The software barriers assures that the compiler does not optimize out code sequences that would read, writeor read-modify-write instructions in the code sequence.

The other types of memory barriers are processor memory barriers where you require that the read or write or read-modify-write has completed at a given point in the code or has occurred in a given sequence. Some algorithms require a specific memory order sequence in order to correctly produce the desired results. In these situations you cannot have the processor itself (memory management part thereof) re-order memory reads and/or writes. For these situations there are one ormore memory fencing instructions or code sequences to follow that have the side effect of producing the desired type of memory fence.

Think of the processor as a card game where you have a dealer, a deck of cards and several players. The dealer deals out the cards in the order of the deck. (the card deck may have missing or duplicate cards). The sequence in which the players receive the cards is not necessarily the sequence in which they play the cards. It is up to you (the programmer) to specify the rules and special circumstances that alter players choice of which card to play next.

Jim Dempsey
0 Kudos
Reply