Intel® C++ Compiler
Community support and assistance for creating C++ code that runs on platforms based on Intel® processors.
7944 Discussions

redundant load introduced when using a union

Absoler
Beginner
1,070 Views

here's the code:

union U1 {
  int f0;
  int f1;
  short f4;
};

int g = 1;
int f=1;

func_25(union U1 c) {
  int32_t *d = &g;
  c.f0 = f;
  if (c.f1 > (uint64_t)c.f4) //f4必须拥有比f1小的size,并强制转化为64位
    *d = 0;
}

int main(){
  ...
  union U1 c;
  func_25(c);
  ...
}
 

when compiled with icc 2021.6.0 20220226 (-O1), generated code is like this:

0000000000401426 <func_25>:
  401426: 8b 05 94 6c 00 00       mov 0x6c94(%rip),%eax         # 4080c0 <g_166>
  40142c: 0f be 15 8d 6c 00 00   movsbl 0x6c8d(%rip),%edx   # 4080c0 <g_166>
  401433: 3b c2                               cmp %edx,%eax
  401435: 7e 0a                               jle 401441 <func_25+0x1b>
  401437: c7 05 7f d5 00 00 00  movl $0x0,0xd57f(%rip)          # 40e9c0 <g>
  40143e: 00 00 00
  401441: c3                                     retq

we can see after assigning f to c, it still load f when c.f1 and c.f4 are needed, is this a problem? f may be a shared variable, or there may exist a performance problem.

Labels (2)
0 Kudos
7 Replies
HemanthCH_Intel
Moderator
1,048 Views

Hi,


Thank you for posting in Intel Communities.


>>"we can see after assigning f to c, it still load f when c.f1 and c.f4 are needed, is this a problem? f may be a shared variable, or there may exist a performance problem."

We couldn't understand your problem statement. So could you please elaborate on your statement?


Thanks & Regards,

Hemanth



0 Kudos
Absoler
Beginner
1,041 Views

Thanks for your attention!

sorry for pasting the wrong asm code, it should be like this: (the g_166 should be f)

00000000004013f5 <func_25>:
    4013f5: 8b 05 b1 6c 00 00         mov 0x6cb1(%rip),%eax            # 4080ac <f>
    4013fb: 0f bf 15 aa 6c 00 00      movswl 0x6caa(%rip),%edx      # 4080ac <f>
    401402: 3b c2                               cmp %edx,%eax
    401404: 76 0a                               jbe 401410 <func_25+0x1b>
    401406: c7 05 98 6c 00 00 00  movl $0x0,0x6c98(%rip)             # 4080a8 <g>
    40140d: 00 00 00
    401410: c3 retq

 

For the first situation, where f is a shared variable, then if f is modifed between the first and the second instruction, then then result of cmp instruction will be wrong because the f was loaded into local c

And if not shared,  maybe the comparison result can be inferred because c was just assigned before the if-condition.

Thanks again for your guidance!

0 Kudos
HemanthCH_Intel
Moderator
994 Views

Hi,


A union is a special data type available in C/C++ that allows to store of different data types in the same memory location. So if we update any variable in union reference variable(c), then all the variables present in the union are pointing to the updated variable. So if we compare c.f1 and c.f4, which are always the same(which points to the last updated code). thus We are getting the expected results after running the code.


Thanks & Regards,

Hemanth


0 Kudos
Absoler
Beginner
971 Views

Thanks for your explanation! I understand the necessity of comparing c.f1 and c.f4. But there may still be problem when the compiler choose to replace them with variable f ( I guess this operation can save stack memory? ).

When f is modified between    

    4013f5: 8b 05 b1 6c 00 00         mov 0x6cb1(%rip),%eax            # 4080ac <f>

and

    4013fb: 0f bf 15 aa 6c 00 00      movswl 0x6caa(%rip),%edx      # 4080ac <f>

then %eax and %edx hold different versions of f and the result of comparison between them will be wrong. However, in the program it's the fields of local variable c are compared, and it doesn't have this vulnerability. So could we treat this as a change in semantics?

0 Kudos
HemanthCH_Intel
Moderator
893 Views

Hi,


The ICC compiler could re-use the value of "f" in the register %eax, and sign-extend it, instead of re-loading it. The newer clang-based compiler (icx) does make this optimization properly.


What is actually happening in ICC compiler, is a "half-completed" optimization. The compiler can store the union completely in registers, but it is not doing this because of the short size of "c.f4". The field "c.f0" is registerized, but "c.f4" is kept in memory. Later, the compiler sees that a load of "c.f4" can be replaced with the load of "f". 


If the value of "f" were changed, this would not cause any incorrect results. The 2nd load of "f" is done directly after the 1st load of "f".


Thanks & Regards,

Hemanth


0 Kudos
HemanthCH_Intel
Moderator
813 Views

Hi,


We haven't heard back from you. Could you please provide an update on your issue?


Thanks & Regards,

Hemanth


0 Kudos
HemanthCH_Intel
Moderator
778 Views

Hi,


We assume that your issue is resolved. If you need any additional information, please post a new question as this thread will no longer be monitored by Intel.


Thanks & Regards,

Hemanth


0 Kudos
Reply