In a similar implementation based on OpenMP I used #pragma omp atomic to avoid race conditions. Though atomic floats are very slow, this payed of in a much better scalability and eventually in better performance.
As far as I understand from the documentation of TBB and this forum, there is no support for atomic floats in TBB because there is no need for it. Anyway, is this the only reason? Or are there do hardware limitations hamper the implementation of atomic floats in TBB?
Thank you for your prompt reply.
I recompiled TBB using your path and it works just fine. More important, in my application, atomic operations in TBB perform equally to atomic operations in OpenMP. There is only one minor problem: I have to stick to the atomic
Anyway, when will this path become part of any TBB release? Is there still no need for atomic floats?
Again, thank you for your help.
Thanks, glad to hear that. Be sure to really test it, though.
"I have to stick to the atomic
"Anyway, when will this path become part of any TBB release? Is there still no need for atomic floats?"
Intel's compiler has some C++0x features already (I'd have to check about atomics), so maybe that's why an integration of this into mainline TBB is perceived as less urgent; I have no idea when C++0x features can be expected to be widely available. It's up to potential users like yourself to express any need for atomic floats.
As a numerical-computations-guy, I can see several applications for this feature.
And I hope the patch is compatible with TBB 2.2.
I compared the output (8 core CPU) to that of a sequential implementation that uses no threading library at all. The output is identical. The output always differed, when not preventing race conditions.
In my application I have a method with the following signature void convolve(double*). Within this method I use parallel_for to perform a convolution on the argument field. No race conditions occur in this methods.
Beforehand, the same field is updated by multiple threads concurrently. That is why I changed the field's type to atomic
Maybe this is an advantage of OpenMP: atomicity is only specified when necessary.
Please, excuse my ignorance, but how are atomics related to C++0x?
Btw. I use the Intel C/C++ compiler version 10.1.
Care to sponsor an update? :-)
#5 "Please, excuse my ignorance, but how are atomics related to C++0x?"
There's a new "atomic operations library".
#5 "Btw. I use the Intel C/C++ compiler version 10.1."
I defer to Intel for details.
"However one curious bit of information, that came up at the time, is the protecting your floats with a user-space spin-lock results in practically the same performance. Using a spin-lock will probably make everything safer (if you are running on some edge-case platforms), but having atomic
"But in either case atomic
"This means depending on your situation they might be a good option, or you might need to rethink your atomicity/locking scheme."
If you can amortise the cost of locking somehow, atomic floats will surely lose. Or did you mean something else?
P.S.: Glad to hear you like it.
(Silly stuff removed.)
pkegel, Raf, Robert,
The problem with atomic
Take pkegel requirement to update a large dataset using parallel_for. In many cases, only the boundary cells require atomnicity while everything in between does not.
In other cases, such as particle interaction, it is much more efficient to partition the data and work on independent partition pairs in a non-interfering manner. This will require rewriting a 3 line loop in 300 lines (once) but those 300 lines will run > 100x faster than using all atomics.
The question is, do you invest some time (once) in programming effort, or do you wast time waiting for results ever after?
Which would be most appropriate: differentiate between double for interior cells and atomic