- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I made an atomic float, and it's probably not blazingly fast (that's ok), but its faster than wrapping a lock around a float, and it works, but I'm not sure if this is because of my good luck, or if this is actually thread safe. I think it is... but you never know, so I came to ask the experts :)
struct AtomicFloat: public tbb::atomic
{
float compare_and_swap(float value, float compare)
{
size_t value_ = tbb::atomic
return reinterpret_cast
}
float operator=(float value)
{
size_t value_ = (*this).tbb::atomic
return reinterpret_cast
}
operator float()
{
return reinterpret_cast
}
float operator+=(float value)
{
volatile float old_value_, new_value_;
do
{
old_value_ = reinterpret_cast
new_value_ = old_value_ + value;
} while(compare_and_swap(new_value_,old_value_) != old_value_);
return (new_value_);
}
};
Also as a caveat I'm placing a static assert for size_of(float) == size_of(size_t).
Thanks!
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I dare not look too closely, but would you pick my lottery numbers next week?
How is this used? Has anybody else missed atomic floats? What operations should be supported?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I dare not look too closely, but would you pick my lottery numbers next week?
How is this used? Has anybody else missed atomic floats? What operations should be supported?
Sure I'll pick your lottery numbers :)
There you go 3-7-12-23-24.
But on a more serious note,I want/plan to support all generic tbb::atomic operations with the same interface, if this is a viable solution(thread safe).
And their intended use is as data-members on objects.
class ThreadSafeActor
{
atomic
atomic
atomic
};
So you see I'd rather have atomic floats, than locks on the object or each data member. And so far it amazingly works while running 10 threads concurrently on the same object. Nothing has broken so far...
----------
Edit: Performance wise using a locked float for 5.000.000 += operations with 100 threads on my machine takes 3.6s, while my atomic float even with its silly do-while takes 0.2s to do the same work. So the >30x performance boost means its worth it, (and this is the catch) if its correct.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
{
float fetch_and_add( float addend )
{
return reinterpret_cast
float fetch_and_add( float addend )
{
const uint_32 value_ = tbb::atomic
return reinterpret_cast
}
template
float fetch_and_increment()
{
const uint_32 value_ = tbb::atomic
return reinterpret_cast
}
float fetch_and_increment()
{
const uint_32 value_ = tbb::atomic
return reinterpret_cast
}
template
float fetch_and_decrement()
{
const uint_32 value_ = tbb::atomic
return reinterpret_cast
}
float fetch_and_decrement()
{
const uint_32 value_ = tbb::atomic
return reinterpret_cast
}
template
float fetch_and_store( float value )
{
const uint_32 value_ = tbb::atomic
return reinterpret_cast
}
float fetch_and_store( float value )
{
const uint_32 value_ = tbb::atomic
return reinterpret_cast
}
template
value_type compare_and_swap( value_type value, value_type comparand )
{
const uint_32 value_ = tbb::atomic
return reinterpret_cast
}
float compare_and_swap(float value, float compare)
{
const uint_32 value_ = tbb::atomic
return reinterpret_cast
}
operator float() const volatile // volatile qualifier here for backwards compatibility
{
const uint_32 value_ = (*this);
return reinterpret_cast
}
float& _internal_reference() const
{
return reinterpret_cast
}
float operator=(float value){
const uint_32 value_ = (*this).tbb::atomic
return reinterpret_cast
}
float operator+=(float value)
{
volatile float old_value_, new_value_;
do
{
old_value_ = reinterpret_cast
new_value_ = old_value_ + value;
} while(compare_and_swap(new_value_,old_value_) != old_value_);
return (new_value_);
}
float operator-=(float value)
{
volatile float old_value_, new_value_;
do
{
old_value_ = reinterpret_cast
new_value_ = old_value_ - value;
} while(compare_and_swap(new_value_,old_value_) != old_value_);
return (new_value_);
}
float operator++()
{
volatile float old_value_, new_value_;
do
{
old_value_ = reinterpret_cast
new_value_ = old_value_++;
} while(compare_and_swap(new_value_,old_value_) != old_value_);
return (new_value_);
}
float operator--()
{
volatile float old_value_, new_value_;
do
{
old_value_ = reinterpret_cast
new_value_ = old_value_--;
} while(compare_and_swap(new_value_,old_value_) != old_value_);
return (new_value_)
}
};
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I miss being able to edit later...
atomic_word is not in tbb, of course, which makes an assert look more attractive against the alternative of emulating or blindly trusting atomic_word.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Fixed some stuff and removed some stuff... especially the additions, oops! :)
struct AtomicFloat
{
tbb::atomic
template
float fetch_and_store( float value )
{
const uint_32 value_ = atomic_value_.tbb::atomic
return reinterpret_cast
}
float fetch_and_store( float value )
{
const uint_32 value_ = atomic_value_.tbb::atomic
return reinterpret_cast
}
template
float compare_and_swap( float value, float comparand )
{
const uint_32 value_ = atomic_value_.tbb::atomic
return reinterpret_cast
}
float compare_and_swap(float value, float compare)
{
const uint_32 value_ = atomic_value_.tbb::atomic
return reinterpret_cast
}
operator float() const volatile // volatile qualifier here for backwards compatibility
{
const uint_32 value_ = atomic_value_;
return reinterpret_cast
}
float operator=(float value)
{
const uint_32 value_ = atomic_value_.tbb::atomic
return reinterpret_cast
}
float operator+=(float value)
{
volatile float old_value_, new_value_;
do
{
old_value_ = reinterpret_cast
new_value_ = old_value_ + value;
} while(compare_and_swap(new_value_,old_value_) != old_value_);
return (new_value_);
}
float operator*=(float value)
{
volatile float old_value_, new_value_;
do
{
old_value_ = reinterpret_cast
new_value_ = old_value_ * value;
} while(compare_and_swap(new_value_,old_value_) != old_value_);
return (new_value_);
}
float operator/=(float value)
{
volatile float old_value_, new_value_;
do
{
old_value_ = reinterpret_cast
new_value_ = old_value_ / value;
} while(compare_and_swap(new_value_,old_value_) != old_value_);
return (new_value_);
}
float operator-=(float value)
{
return this->operator+=(-value);
}
float operator++()
{
return this->operator+=(1);
}
float operator--()
{
return this->operator+=(-1);
}
float fetch_and_add( float addend )
{
return this->operator+=(-addend);
}
float fetch_and_increment()
{
return this->operator+=(1);
}
float fetch_and_decrement()
{
return this->operator+=(-1);
}
};
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I miss being able to edit later...
atomic_word is not in tbb, of course, which makes an assert look more attractive against the alternative of emulating or blindly trusting atomic_word.
The underlaying compare and swap is operating on32-bit valuewhen used for float, and presumably later by 64-bit value when used for double. The values as used by the compare and swap pass through general purpose registers and not through the FPU or SSE/MMX registers. Therefore, it may be advisable in your compare_and_swap to not obtain the old value (for use as comparand) by way of C++ statements that treat the copy operation as a copy of floats, which may use the FPU, SSE/MMX or other instructions for different processore. Instead, I would suggest copying the old value, for use as comparand, using a statment that casts the variable as int32 or int64 as the case may be. To use the cast as float may exhibit problems with old values containing denormalized numbers, NaN's, +/- Infinities, -0, etc..
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sorry, I should have checked first: increment and decrement are defined on all arithmetic and pointer types, which includes floating point. Even a bool can be incremented (which results in value true)... but not decremented; I'm happy that I didn't know this (although now I do).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Robert
old_value_ = reinterpret_cast
(atomic_value_); new_value_ = old_value_ + value;//floating point binary representation is not an issue because//we are using our self's compare and swap, thus comparing floats and floats
If in obtaining old_value_ the compiler generate FPU instructions to copy the data .AND. if atomic_value_ contains a denormalized number, then old_value_ will never == atomic_value_ and therefore the compare and swap will always fail. In the case of the denormalized number the new_value_ will compute to the value you desire but could never get stored due to binary difference between old_value_ and atomic_value. SSE and MMX instructions may work (garbage in-garbage out). However, consider first time use of atomic_value containing uninitialized data which could potentially be several patters that are not floating point numbers. The safest way would be to copy to old_value using a cast on both sides as uint32 (of your flavor). Then the bit pattern that was in atomic_value_ is now in old_value_ and if the number were "not a normal number" then in this case the attempt at converson would occure at the "+" and not at the first "=". And whatever the result would be storable back into atomic_value_.
The main problem is not with the operator += since presumably with good coding practice this will occur only after at least one operator = that initializes the atomic_value_ to a valid floating point number (but NaN's and other non-sense can get stored too). If operator = uses the compare_and_swap you must match the bit pattern and not use the numeric equivilence (which may have a different bit pattern).
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Robert
old_value_ = reinterpret_cast
(atomic_value_); new_value_ = old_value_ + value;//floating point binary representation is not an issue because//we are using our self's compare and swap, thus comparing floats and floats
If in obtaining old_value_ the compiler generate FPU instructions to copy the data .AND. if atomic_value_ contains a denormalized number, then old_value_ will never == atomic_value_ and therefore the compare and swap will always fail. In the case of the denormalized number the new_value_ will compute to the value you desire but could never get stored due to binary difference between old_value_ and atomic_value. SSE and MMX instructions may work (garbage in-garbage out). However, consider first time use of atomic_value containing uninitialized data which could potentially be several patters that are not floating point numbers. The safest way would be to copy to old_value using a cast on both sides as uint32 (of your flavor). Then the bit pattern that was in atomic_value_ is now in old_value_ and if the number were "not a normal number" then in this case the attempt at converson would occure at the "+" and not at the first "=". And whatever the result would be storable back into atomic_value_.
The main problem is not with the operator += since presumably with good coding practice this will occur only after at least one operator = that initializes the atomic_value_ to a valid floating point number (but NaN's and other non-sense can get stored too). If operator = uses the compare_and_swap you must match the bit pattern and not use the numeric equivilence (which may have a different bit pattern).
Jim Dempsey
Thanks for the in-depth explanation, I had misunderstood your point in your previous post.
But now I can see the possible danger there. Going back to fixing :)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm still wondering whether this is an ad hoc exercise or an indication that I should add floating-point atomics to the "Additions to atomic
During your fixing, remember that just using bit patterns would be oblivious to the equality of plus zero and minus zero for compare_and_swap, but maybe compare_and_swap should be deprecated against a compare_and_store (see the patch: in/out comparand as first parameter, new value as second parameter, third parameter indicating whether spurious failure is disallowed, boolean success status returned). Although, it could also be another parameter: by default minus zero equals plus zero, unless the user decides to differentiate them.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm still wondering whether this is an ad hoc exercise or an indication that I should add floating-point atomics to the "Additions to atomic
During your fixing, remember that just using bit patterns would be oblivious to the equality of plus zero and minus zero for compare_and_swap, but maybe compare_and_swap should be deprecated against a compare_and_store (see the patch: in/out comparand as first parameter, new value as second parameter, third parameter indicating whether spurious failure is disallowed, boolean success status returned). Although, it could also be another parameter: by default minus zero equals plus zero, unless the user decides to differentiate them.
Well I think atomic floats are anecessaryfeature, but then I'm the one that began the thread.
My line of thought here is that in theory, and in most text-book examples of threading, floats can be easily dismissed.
Also when working with graphics, no-one will want to use atomic floats.
But, in my case, working on a multithreaded event-driven simulation (in this case a game server), being able to use atomic floats as object data members, can seriously reduce the amount of locks (and a lock is are fairly big when compared with a float, and even most objects). This improves memory usage (can keep more objects hot), reduces thread convoying, overall making things snappier.
It also helps keep the code cleaner and removes lots of potential deadlocks, that could happen if someone isn't careful. Now if my simulation was non-event driven, say just calculating some complex end-state, I'd be better off using vectorization, just as with graphics. But alas its not.
The only down-side here is that without the proper a instruction set atomic floats are kind of a hack (A-B-A problem), but a useful tool if usedappropriately.
Nevertheless perhaps the real solution in the long run isn't atomics though, and instead its transactional memory, but since I'm working on a commercial product with dead-lines, Intel's STM won't do me any good for now. And adoption of transactional memory if it catches on will probably be something that happens several years down the road. So (pseudo-)atomic floats in the meantime would probably be welcomed many.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
"a lock is are fairly big when compared with a float" I think you should test your assumptions again with tbb::spin_mutex for a lock, both about size (currently always one byte) and about performance (may be substantially better than what you've seen, especially if you keep the float and the lock in the same cache line and are careful about what else to allow there).
"removes lots of potential deadlocks" That's where scoped locks come in (assuming the thread remains active).
"A-B-A problem" Depending on your application...
"Intel's STM won't do me any good for now" What product would that be, and when is it expected?
P.S.: Actually I need either 5 out of 50 plus 2 out of 9 or 6 out of 42 for that lottery thing.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I just skimmed the thread, so maybe I missed this. What is a potential use case for an atomic float? I've thought myself about them, but I'd like to hear some ideas on how they can be useful. I did see your Actor class.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
But, in my case, working on a multithreaded event-driven simulation (in this case a game server), being able to use atomic floats as object data members, can seriously reduce the amount of locks (and a lock is are fairly big when compared with a float, and even most objects). This improves memory usage (can keep more objects hot), reduces thread convoying, overall making things snappier.
Mutex can be as small as 1 bit. I.e. you can pack up to 32 mutexes into single 32-bit word.
You can see some interesting bit mutex implementation here:
http://groups.google.com/group/comp.programming.threads/browse_frm/thread/bbf4e0b8d1db5b4f
The interesting thing about it is that it's possible to lock/unlock arbitrary set of mutexes with single atomic RMW.
OR you can pack mutex and {pointer, number, enum, etc} into single word, i.e. no waste of space at all.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Nevertheless perhaps the real solution in the long run isn't atomics though, and instead its transactional memory, but since I'm working on a commercial product with dead-lines, Intel's STM won't do me any good for now. And adoption of transactional memory if it catches on will probably be something that happens several years down the road. So (pseudo-)atomic floats in the meantime would probably be welcomed many.
Locks will be much-much faster and scalable. And no possibility of deadlock, since transaction involves only one object.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Robert,
The atomic float operation is not an A-B-A problem. Instead, it is a read/modify/write problem whereby a competing thread can interceed between the read and write. This adverse interacton with respect to operator on float can be worked around using a CAS (single word compare and swap). The A-B-A problem typically involves a linked list operation whereby the head (or tail) pointer could point at node A on the read portion, of an attempted change, then the thread stalls (interrupt or context switch), then while the thread is stalled, the list changes state (potentially many times) and ends up with the head pointer point pointing at A. However, and this is the important part. Your CAS operation may be dependent on the head pointer still holding a pointer to A but your code may also be relying on node A not having been altered between the time you examined A and performed your CAS. If node A had been altered your CAS would succeed (in this example) but the copied value you obtained from A just before the CAS is not consistent with the current value. The end result being you just trashed the head of list pointer (or tail pointer). The atomic float can be protected with the CAS, and the A-B-A cannot.
To protect for A-B-A you typically use DCAS (Double word Compare And Swap) using a pointer and sequence number. This assumes your processor supports a DCAS instruction.
By the way. You are aware you could have used
#pragma omp atomic
sum += bump;
Jim Dempsey

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page