Intel® C++ Compiler
Community support and assistance for creating C++ code that runs on platforms based on Intel® processors.
7944 Discussions

Extra overhead in atomic operations generated by Intel ICPC 2021.2

fatvlad1744
Beginner
944 Views

Greetings,

 

I was experimenting with thread synchronization and tried different approaches. I expected volatile int to behave similarly to std::atomic<int> when used with relaxed memory semantics, however it appears that std::atomic<int> is slower. The reason for that is redundant mov/lea instruction that is not being elided by optimizer.

For example, here's the code snippet:

#include <atomic>

void load_atomic(std::atomic<int>& v, int& dest) {
    dest = v.load(std::memory_order_relaxed);
}

void load_intrin(int& v, int& dest) {
    dest = __atomic_load_n(&v, __ATOMIC_RELAXED);
}

void load_volatile(volatile int& v, int& dest) {
    dest = v;
}

 

I'm generating the code with the following compilation string:
$CXX -std=c++20 -O3 -S main2.cpp

 

g++ (GCC) 10.2.0 (stripped)

_Z11load_atomicRSt6atomicIiERi:
	movl	(%rdi), %eax
	movl	%eax, (%rsi)
	ret
_Z11load_intrinRiS_:
	movl	(%rdi), %eax
	movl	%eax, (%rsi)
	ret
_Z13load_volatileRViRi:
	movl	(%rdi), %eax
	movl	%eax, (%rsi)
	ret

ICX 2021.2 (stripped)

_Z11load_atomicRSt6atomicIiERi:  
	movl	(%rdi), %eax
	movl	%eax, (%rsi)
	retq
_Z11load_intrinRiS_:
	movl	(%rdi), %eax
	movl	%eax, (%rsi)
	retq
_Z13load_volatileRViRi:
	movl	(%rdi), %eax
	movl	%eax, (%rsi)
	retq

ICPC 2021.2 (stripped)

_Z11load_atomicRSt6atomicIiERi:
        movq      %rdi, %rax                                    #4.14
        movl      (%rax), %eax                                  #4.14
        movl      %eax, (%rsi)                                  #4.5
        ret                                                     #5.1
_Z11load_intrinRiS_:
        movq      %rdi, %rax                                    #8.12
        movl      (%rax), %eax                                  #8.12
        movl      %eax, (%rsi)                                  #8.5
        ret                                                     #9.1
_Z13load_volatileRViRi:
        movl      (%rdi), %eax                                  #12.12
        movl      %eax, (%rsi)                                  #12.5
        ret

So you can see that only ICPC has extra mov that can be fused with subsequent instruction.

Why does it happen and can we expect this to be fixed in the new version?
I've attached full IR files to the post.

 

Thanks in advance.

0 Kudos
4 Replies
VidyalathaB_Intel
Moderator
903 Views

Hi,

Thanks for reaching out to us.

We are looking into this issue internally. we will get back to you soon.

Regards,

Vidya.


0 Kudos
Viet_H_Intel
Moderator
888 Views

I've reported this issue to our compiler Developer.

Thanks,


0 Kudos
Viet_H_Intel
Moderator
696 Views

Not sure if you already knew, but Intel Classic Compiler will enter "Legacy Product Support" mode, signaling the end of regular updates. Please refer to the article bellow for more details.

https://www.intel.com/content/www/us/en/developer/articles/technical/adoption-of-llvm-complete-icx.h...


For that reason, Developer isn't plan to to fix this in Classic compiler. Can you migrate to icx/icpx? and let us know if we could close this case?


Thanks,

Viet


0 Kudos
Viet_H_Intel
Moderator
675 Views

Please migrate to icx. We are going to close this as won't fix in C++ Classic compiler. If you need any additional information, please post a new question as this thread will no longer be monitored by Intel. 


Thanks,


0 Kudos
Reply