Intel® C++ Compiler
Community support and assistance for creating C++ code that runs on platforms based on Intel® processors.
Announcements
The Intel sign-in experience has changed to support enhanced security controls. If you sign in, click here for more information.
7782 Discussions

Extra overhead in atomic operations generated by Intel ICPC 2021.2

fatvlad1744
Beginner
701 Views

Greetings,

 

I was experimenting with thread synchronization and tried different approaches. I expected volatile int to behave similarly to std::atomic<int> when used with relaxed memory semantics, however it appears that std::atomic<int> is slower. The reason for that is redundant mov/lea instruction that is not being elided by optimizer.

For example, here's the code snippet:

#include <atomic>

void load_atomic(std::atomic<int>& v, int& dest) {
    dest = v.load(std::memory_order_relaxed);
}

void load_intrin(int& v, int& dest) {
    dest = __atomic_load_n(&v, __ATOMIC_RELAXED);
}

void load_volatile(volatile int& v, int& dest) {
    dest = v;
}

 

I'm generating the code with the following compilation string:
$CXX -std=c++20 -O3 -S main2.cpp

 

g++ (GCC) 10.2.0 (stripped)

_Z11load_atomicRSt6atomicIiERi:
	movl	(%rdi), %eax
	movl	%eax, (%rsi)
	ret
_Z11load_intrinRiS_:
	movl	(%rdi), %eax
	movl	%eax, (%rsi)
	ret
_Z13load_volatileRViRi:
	movl	(%rdi), %eax
	movl	%eax, (%rsi)
	ret

ICX 2021.2 (stripped)

_Z11load_atomicRSt6atomicIiERi:  
	movl	(%rdi), %eax
	movl	%eax, (%rsi)
	retq
_Z11load_intrinRiS_:
	movl	(%rdi), %eax
	movl	%eax, (%rsi)
	retq
_Z13load_volatileRViRi:
	movl	(%rdi), %eax
	movl	%eax, (%rsi)
	retq

ICPC 2021.2 (stripped)

_Z11load_atomicRSt6atomicIiERi:
        movq      %rdi, %rax                                    #4.14
        movl      (%rax), %eax                                  #4.14
        movl      %eax, (%rsi)                                  #4.5
        ret                                                     #5.1
_Z11load_intrinRiS_:
        movq      %rdi, %rax                                    #8.12
        movl      (%rax), %eax                                  #8.12
        movl      %eax, (%rsi)                                  #8.5
        ret                                                     #9.1
_Z13load_volatileRViRi:
        movl      (%rdi), %eax                                  #12.12
        movl      %eax, (%rsi)                                  #12.5
        ret

So you can see that only ICPC has extra mov that can be fused with subsequent instruction.

Why does it happen and can we expect this to be fixed in the new version?
I've attached full IR files to the post.

 

Thanks in advance.

0 Kudos
4 Replies
VidyalathaB_Intel
Moderator
660 Views

Hi,

Thanks for reaching out to us.

We are looking into this issue internally. we will get back to you soon.

Regards,

Vidya.


Viet_H_Intel
Moderator
645 Views

I've reported this issue to our compiler Developer.

Thanks,


Viet_H_Intel
Moderator
453 Views

Not sure if you already knew, but Intel Classic Compiler will enter "Legacy Product Support" mode, signaling the end of regular updates. Please refer to the article bellow for more details.

https://www.intel.com/content/www/us/en/developer/articles/technical/adoption-of-llvm-complete-icx.h...


For that reason, Developer isn't plan to to fix this in Classic compiler. Can you migrate to icx/icpx? and let us know if we could close this case?


Thanks,

Viet


Viet_H_Intel
Moderator
432 Views

Please migrate to icx. We are going to close this as won't fix in C++ Classic compiler. If you need any additional information, please post a new question as this thread will no longer be monitored by Intel. 


Thanks,


Reply