Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.
Announcements
This community is designed for sharing of public information. Please do not share Intel or third-party confidential information here.
2421 Discussions

Compilation of atomic reads into 3 identical loads

Stephan_T_
Beginner
144 Views

Hello,

this is more out of curiosity than anything else. When looking at the generated assembly code for a tight loop that polls an atomic state member until it has a certain value, I see that the read of the atomic variable is translated by the compiler (gcc 5.2.0 x64) into 3 identical loads (as shown by the assmbly view in vTune). So:

while (m_state == TS_BUSY_WAITING) { ASM_PAUSE; }

turns into

Block 7:
pause
movl  0x3c(%rdi), %eax
movl  0x3c(%rdi), %eax
movl  0x3c(%rdi), %eax
cmp $0x3, %eax
jz 0x1b5af88 <Block 7>

Notice the 3 identical movl operations.
 
What is the cause behind this translation? I see a similar translation also in other places where tbb::atomic is being used.
 
 
0 Kudos
2 Replies
Alexei_K_Intel
Employee
144 Views

Hi Stephan,

Thank you for the report. After investigation it looks like a GCC issue. I have created Bug 84151 in GCC Bugzilla.

Regards,
Alex

Stephan_T_
Beginner
144 Views

And here I am thinking this would be some kind of cache-line vodoo to improve performance :-) I'm glad I asked.

Thanks for taking care of this!

Stephan

Reply