question about tbb::atomic assignment

gast128 · ‎12-08-2010

Hello all,

The InterlockedExchange guarantees atomic exchange.In the generated instructions one can see thatthe xchg instruction (which implicitly locks the memory bus) is used. The xchg instruction ofc adds exchange functionality which must be atomic to both values which gets exchanged.

For the assignment operator of tbb::atomic there seems to beno use any lock or xchg instruction (see __TBB_machine_load_store::store_with_release). Still this must bethread safe. Is it it becauseall 4byte memory assignment operations on x86 are atomic, e.g. mov , 2 is already atomic? Even if it is atomic, its changed value must also seen by other cores / processors.

Another question is that the memory fence(__TBB_release_consistency_helper()) seemsto be defined to nothing.

RafSchietekat · ‎12-09-2010

Yes. (Hey, I can save keystrokes too!)

Dmitry_Vyukov · ‎12-09-2010

> Is it it becauseall 4byte memory assignment operations on x86 are atomic, e.g. mov , 2 is already atomic?

Yes.

> Even if it is atomic, its changed value must also seen by other cores / processors.

It's very moot point. But in general, yes, it should be eventually visible to other threads.

> Another question is that the memory fence(__TBB_release_consistency_helper()) seemsto be defined to nothing.

So, what is the question?

gast128 · ‎12-09-2010

The atomic variable is mostly used in simple scenario to run a thread, e.g.

classHandler
{
public:
Handler()
{
m_bRun= true;
boost::thread(&Handler::Run, this);
}

void Stop()
{
m_bRun = false;
}

void Run()
{
while (m_bRun)
{
...
}
}

boost::atomic m_bRun;
};

so the atomic update is fine, but ofc its change must beseen by the handler threadas soon as possible.

Dmitry_Vyukov · ‎12-09-2010

> so the atomic update is fine, but ofc its change must beseen by the handler threadas soon as possible.

Yes, it's usually implied by atomic variables. Any sane implementation will do it's best to ensure reactive visibility.

gast128 · ‎12-10-2010

I agree, but do u know if this is the case. As far as I could spy the source code, I do not seen any explicit memory flushing operation (be it a lock opertion, or memory fence). This could be answeredif on the x86 platformif all memory operations are seen by other cores / processor as soon as they happen (probably related to the cache coherency protocol?).

Dmitry_Vyukov · ‎12-10-2010

Explicit memory flushing is required only non cache-coherent architectures. All modern commodity hardware (IA-32, Intel 64, IA-64, SPARC, POWER, etc) is cache-coherent.
Memory fences and mutexes has nothing to do with memory visibility, they are about mutual ordering.
On a cache-coherent architecture a memory write being done is automatically propagated between cores/processors in a best effort manner ASAP.

gast128 · ‎12-14-2010

Thx for u answer.

Although I didn't talk about mutexes, they make promises about memory visibility. See for example 3.4 from Butenhofs' 'Programming with POSIX threads': Whatever memory values a thread can see when it unlocks a mutex,..., can also be seen by any thread that later locks the same mutex.

Reading the MSDN documentation about _mm_lfence / _mm_mfence would also make me think that not every memory write is globally visible.

ARCH_R_Intel · ‎12-14-2010

There's two issues here: "does a write become visible?" and "when does the write become visible?"

The definitive answer to the second question onIA-32 and Intel64 is in Intel 64 and IA-32 Architectures Software Developer's Manual Volume 3A: System Programming Guide, Section 8.2.3"Examples Illustrating the Memory-Ordering Principles".

The answer to the first question is not explicitly guaranteed in the architecture as far as I know, even by fences. As a practical matter any implementation of a cache-coherent architecture does ensure that a write eventually becomes visible, otherwise many programs would hang.

For example, considertheLinux implementation of aspin lock on IA-32. A locked instruction is used to acquire the lock, but a plain store is used to release the lock. The rules in Section 8.2.3 cited above guarantee that a plain store suffices forreleasing the lock. However, if write did not also eventually become globallly visible, the lock could never be acquired by another thread. Thus any practically implementation of the architecture must make the write globally visible eventually.

Most other processors (for example, Itanium) do need a special store or fencesto enforce ordering. But IA-32 and Intel64 need them only if you are using non-temporal store instructions (MOVNT).