Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.
2466 Discussions

Compilation of atomic reads into 3 identical loads

Stephan_T_
Beginner
375 Views

Hello,

this is more out of curiosity than anything else. When looking at the generated assembly code for a tight loop that polls an atomic state member until it has a certain value, I see that the read of the atomic variable is translated by the compiler (gcc 5.2.0 x64) into 3 identical loads (as shown by the assmbly view in vTune). So:

while (m_state == TS_BUSY_WAITING) { ASM_PAUSE; }

turns into

Block 7:
pause
movl  0x3c(%rdi), %eax
movl  0x3c(%rdi), %eax
movl  0x3c(%rdi), %eax
cmp $0x3, %eax
jz 0x1b5af88 <Block 7>

Notice the 3 identical movl operations.
 
What is the cause behind this translation? I see a similar translation also in other places where tbb::atomic is being used.
 
 
0 Kudos
2 Replies
Alexei_K_Intel
Employee
375 Views

Hi Stephan,

Thank you for the report. After investigation it looks like a GCC issue. I have created Bug 84151 in GCC Bugzilla.

Regards,
Alex

0 Kudos
Stephan_T_
Beginner
375 Views

And here I am thinking this would be some kind of cache-line vodoo to improve performance :-) I'm glad I asked.

Thanks for taking care of this!

Stephan

0 Kudos
Reply