#pragma omp atomic vs InterlockedDecrement

AndrewC · ‎05-11-2011

[cpp]#ifdef _USEWIN32LOCKAPI
	InterlockedDecrement(&refs_);
#else
#pragma omp atomic
	refs_--;
#endif
[/cpp]

My benchmarks have shown that

[cpp]InterlockedDecrement is much faster than using #pragma omp atomic
Why? I would think the compiler can generate inline code here?
Composer XE 2011, Windows 64.[/cpp]

Om_S_Intel · ‎05-11-2011

Could you please provide sample code that we can compile to review the issue?

Om

AndrewC · ‎05-12-2011

This is my "sample" code. I am using Composer XE 2011 Update 3 64-bit compiler.

This is pretty simple.

If you do an assembly language listing, the omp code calls

__kmpc_global_thread_num and
__kmpc_atomic_fixed4_add

While the "Windows" code seems to doing it "inline" assembly

[cpp]#include 
LONG refs_=0;

void WINatomicAdd()
{

	InterlockedIncrement(&refs_);
}

void OMPatomicAdd()
{
#pragma omp atomic
  ++refs_;
}


[/cpp]

Om_S_Intel · ‎06-01-2011

It looks openmp atomic slower. Youmay use InterlockedIncrement.