Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.

Atomic wrapper around a pointer to integer

jaredkeithwhite
New Contributor I
1,260 Views
Here's what I've got:

I have some object (call it "nstrument") which calculates an integer value. There is a possibility that multiple threads could pass this object "sensor data") and then the instrument needs to recalculate it's output value. This output value is used in a separate thread.

Since it's possible that multiple threads could invoke the "instrument.ondata(somevalue)" method, which then should recalculate the result, it's obvious that this value needs to be atomically stored. Additionally, we will need to always read consistent results when we're reading the instrument values. Having said that, there is another catch. Let's say that I want to tell the instrument class where to put the value (using a pointer).

My question is: Is it possible to create an atomic reference that wraps this pointer to an integer?



0 Kudos
1 Solution
Dmitry_Vyukov
Valued Contributor I
1,260 Views
Stores and loads to word-sized aligned memory locations are atomic on all wide-spread architectures. So what you described will work with plan 'int* volatile'.

View solution in original post

0 Kudos
11 Replies
Dmitry_Vyukov
Valued Contributor I
1,261 Views
Stores and loads to word-sized aligned memory locations are atomic on all wide-spread architectures. So what you described will work with plan 'int* volatile'.

0 Kudos
jaredkeithwhite
New Contributor I
1,260 Views
Dmitriy,

Wonderful -- I'm a bit new to C++ and wasn't quite sure if I could trust "volatile" or not. And, with your rather impressive Status Points accumulation, I will trust your reply!

Quoting - Dmitriy Vyukov
Stores and loads to word-sized aligned memory locations are atomic on all wide-spread architectures. So what you described will work with plan 'int* volatile'.


0 Kudos
RafSchietekat
Valued Contributor III
1,260 Views

I don't understand what's going on here, butmaybe that's justbecause I've got fewer Status Points.

0 Kudos
jaredkeithwhite
New Contributor I
1,260 Views

Raf,

I assure you that your status points are equally as impressive!

Essentially, I need atomic writes/reads to a variable stored at a pointer. The reason why it's a variable stored at a pointer and not my own variable is because the memory is allocated elsewhere (using a separate library, which uses page-locked memory). Nevertheless, it would appear that "widely used architectures" support atomic read/writes to word-sized pointers (so long as they are aligned), so I'll be sure to align my pointers and hopefully experience no issues.

My confusion as a newb in C++ was reading about the volatile keyword. Just about every thing says that C++ volatile isn't guaranteed to be atomic. Having said that, given the innumerable permutations of C++ flavor/architecture/compiler/operating system, I can imagine why there would be no implicit guarantees.

I'm using the Intel Compiler for Linux version 11, deploying on brand new Xeons, and running Suse 11. Hopefully I'll be safe.


Quoting - Raf Schietekat

I don't understand what's going on here, butmaybe that's justbecause I've got fewer Status Points.


0 Kudos
Dmitry_Vyukov
Valued Contributor I
1,260 Views
Something "special" usually needed when you need either (1) to execute complex RMW (read-modify-write) action in atomic fashion (compare-exchange, exchange, fetch-add), or (2) special ordering of memory operations.

If you need only to store/load int or int* (both assumed to be machine word sized) in atomic fashion (so that nobody will see some intermediate value), then you have to ensure only that the store/load appears in machine code as single machine instruction (MOV on x86). That's what C++'s volatile intended to do.
However, since C++03 standard does not even mention multi-threaded environment, it's a kind of not 101% portable. But all compiler vendors do indeed understand what is multi-threaded environment and that people actually use C++ in multi-threaded environment. So this works in practice.
In C++0x we will have legal means to express atomic actions (std::atomic<>).

Re: Just about every thing says that C++ volatile isn't guaranteed to be atomic.
"Every thing" is theoretically correct, but usually does not understand what is atomicity. It usually thinks that atomicity is limited only to complex RMW operations, or that atomicity implies memory ordering.

Btw, MSVC compiler officially "promote" C++'s volatile to the status of synchronization primitive. MSVC's volatile is guaranteed to be atomic and also provides some memory ordering wrt other processors/cores (volatile stores are release, and volatile loads are acquire).

0 Kudos
RafSchietekat
Valued Contributor III
1,260 Views
"I assure you that your status points are equally as impressive!"
Sorry, forgot the smiley. :-)

"Essentially, I need atomic writes/reads to a variable stored at a pointer. The reason why it's a variable stored at a pointer and not my own variable is because the memory is allocated elsewhere (using a separate library, which uses page-locked memory). Nevertheless, it would appear that "widely used architectures" support atomic read/writes to word-sized pointers (so long as they are aligned), so I'll be sure to align my pointers and hopefully experience no issues."
My confusion was over whether you need atomic access to a pointer or to a referent (the thing being pointed to). My impression is that it's about the referent, but then "int* volatile" doesn't seem to do what you want, while "volatile int*" does. Right? Also, this basically only provides you relaxed atomic read/write, whereas you seem to be interested at least also in atomic increments and the like, and probably also something less bewildering than relaxed/raw memory semantics.

"My confusion as a newb in C++ was reading about the volatile keyword. Just about every thing says that C++ volatile isn't guaranteed to be atomic. Having said that, given the innumerable permutations of C++ flavor/architecture/compiler/operating system, I can imagine why there would be no implicit guarantees."
It is dependent on platform (architecture and compiler) and alignment whether volatile read/write will be atomic (in the original sense, i.e., indivisible), although in most cases natural alignment will provide that on modern platforms. Still, if you're new to C++, volatile is not where you should be looking for shared access: it is a far hairier creature than a Java volatile, sharing little more than the name. I would advise you to look to tbb::atomic instead, because I don't think "volatile" will be right here.

"I'm using the Intel Compiler for Linux version 11, deploying on brand new Xeons, and running Suse 11. Hopefully I'll be safe."
While I wouldn't doubt that naturally aligned volatile accesses are indivisible here, I would still use tbb::atomic instead, for the reasons given above.
0 Kudos
RafSchietekat
Valued Contributor III
1,260 Views
Apparently Dmitriygot his response in before me...

Note that I don't contest that if you really only need read or write with relaxed semantics,and with out-of-the-box tbb::atomic not providing "relaxed" (I've made a patch that does, althoughI haven't kept itup to date with current TBB versions), "volatile" instead of tbb::atomicmay still be an appropriate hack if performance is a real issue (subject to testing), but you wouldhave to bequite careful, and you have declared yourself to be a C++ newbie, so... I am religiously opposed to relying on compiler specifics, so I wouldnever associate non-relaxed memory semantics with "volatile". And don't let any of the C++0x guys catch you doing this, because they'll declare your entire program to have undefined behaviour.
0 Kudos
jaredkeithwhite
New Contributor I
1,260 Views
Raf,

tbb:atomic seems like where I should be looking. However, can tbb:atomic use a variable that someone else created? In my case, I have a pointer to an integer, and it doesn't do me much good for tbb::atomic to create the variable itself.

The reason why I have the pointer to an integer is because that integer is allocated along with a large block of memory that another SDK uses (CUDA, to be exact). I use CUDA to allocate a large block of page-locked memory, and then I am allowed to essentially use what is known as ZeroCopy. ZeroCopy allows me to have the GPUs process all of the data I want processed without directly copying it over into GPU memory first. They process the data from normal Host memory, and asynchronously transfer as needed.

Because of this, I already have the pointer to (quite a large number of) integers. And I have multiple threads that could update those integers. At the same time, there is a magical moment in time when the integers do essentially get transfered over to the GPU, but this happens asynchronously and is more or less "hidden" from view, by design.

Essentially, ZeroCopy's process is the following:
1) allocate a large block of page-locked memory
2) update this memory as I need (need this to be done atomically)
3) process the data on the device
- device transfers the data over as execution occurs, asynchronously, as it's needed

While this seems rather latency intense, it is probably better than the following:

1) allocate my own large block of memory (or allocate a large number of tbb:atomic variables
- I'm running 64 bit linux using ICC, so I'm assuming that's 4 bytes.
2) update my own values using any necessary number of threads
3) individually copy all of this data (would need to be done one at a time, unless I can give tbb:atomic a specific address to fence, and allocate the entire block in a contiguous fashion) into device memory
4) then process the code on the device

So that's why I'm looking to be able to tell tbb::atomic, "Here is the memory address that you fence."

Any thoughts?


Quoting - Raf Schietekat
My confusion was over whether you need atomic access to a pointer or to a referent (the thing being pointed to). My impression is that it's about the referent, but then "int* volatile" doesn't seem to do what you want, while "volatile int*" does. Right? Also, this basically only provides you relaxed atomic read/write, whereas you seem to be interested at least also in atomic increments and the like, and probably also something less bewildering than relaxed/raw memory semantics.

It is dependent on platform (architecture and compiler) and alignment whether volatile read/write will be atomic (in the original sense, i.e., indivisible), although in most cases natural alignment will provide that on modern platforms. Still, if you're new to C++, volatile is not where you should be looking for shared access: it is a far hairier creature than a Java volatile, sharing little more than the name. I would advise you to look to tbb::atomic instead, because I don't think "volatile" will be right here.

"I'm using the Intel Compiler for Linux version 11, deploying on brand new Xeons, and running Suse 11. Hopefully I'll be safe."
While I wouldn't doubt that naturally aligned volatile accesses are indivisible here, I would still use tbb::atomic instead, for the reasons given above.

0 Kudos
RafSchietekat
Valued Contributor III
1,260 Views
"tbb:atomic seems like where I should be looking. However, can tbb:atomic use a variable that someone else created? In my case, I have a pointer to an integer, and it doesn't do me much good for tbb::atomic to create the variable itself."
If you can't use tbb::atomic throughout, you're probably better off trying your luck with volatile anyway.

"So that's why I'm looking to be able to tell tbb::atomic, "Here is the memory address that you fence.""
I did not know this was about communication with a GPU, otherwise I would not have proposed tbb::atomic. Furthermore, you have a block of memory, not just (an) individual variable(s). Doesn't the documentation for CUDA advise about how to do this?

I think that you don't even need volatile, or atomic transfers, because you'll be calling a function that internally flushes the writes (from the CPU to memory) before contacting the GPU. Then the question becomes how this works with multiple threads entering data first... I could hazard a guess, but maybe not before allowing you to report about what the documentation says, and deferring to whoever has already done this.
0 Kudos
jaredkeithwhite
New Contributor I
1,260 Views
Raf,

Alright, so after spending a few hours sleeping rather than thinking about this, it became abundantly obvious to me that the fundamental answer to this problem is an architectural shift. Rather than pushing the data into a location that is assumed to be allocated by the client of the instrument (the page locked memory), the clients will essentially pull from the instrument when needed into some other memory location (either page locked or not, it doesn't matter).

This greatly simplifies a lot of other elements within the framework and improves the scalability of the framework dramatically.

Having said that, with respect to your other questions:
- CUDA documentation makes no assumptions about where the data you're giving CUDA comes from, so little help from there
- the reason why my issue was a bit more complicated than normal is because I have multiple threads producing the data that will be pushed over to the GPUs

Bottom line: i think none of this is necessary, because I'm using a "pull" model than a "push" model, and instruments own their own data.

Thanks for all of the help.


Quoting - Raf Schietekat
"tbb:atomic seems like where I should be looking. However, can tbb:atomic use a variable that someone else created? In my case, I have a pointer to an integer, and it doesn't do me much good for tbb::atomic to create the variable itself."
If you can't use tbb::atomic throughout, you're probably better off trying your luck with volatile anyway.

"So that's why I'm looking to be able to tell tbb::atomic, "Here is the memory address that you fence.""
I did not know this was about communication with a GPU, otherwise I would not have proposed tbb::atomic. Furthermore, you have a block of memory, not just (an) individual variable(s). Doesn't the documentation for CUDA advise about how to do this?

I think that you don't even need volatile, or atomic transfers, because you'll be calling a function that internally flushes the writes (from the CPU to memory) before contacting the GPU. Then the question becomes how this works with multiple threads entering data first... I could hazard a guess, but maybe not before allowing you to report about what the documentation says, and deferring to whoever has already done this.

0 Kudos
RafSchietekat
Valued Contributor III
1,260 Views

"abundantly obvious"
Dangerous words... and also now I can't let on anymore that I'm confused. Good luck, anyway!

0 Kudos
Reply