Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
29269 Discussions

IFX complex skew-conjugate optimization with complex part designators

DataScientist
Valued Contributor I
594 Views

Consider the following two code snippets that perform the same task (of skew-conjugation of complex value), but the IFX (and ifort) compiler(s) emit slightly different assembly instructions for the two:

```fortran
contains
function skewconjg(x) result(res)
complex, intent(in) :: x
complex :: res
res = cmplx(-real(x), aimag(x), kind(res))
end function
end

```

and

```fortran
contains
function skewconjg(x) result(res)
complex, intent(in) :: x
complex :: res
res%re = -x%re; res%im = x%im
end function
end
```

The (different parts of the) corresponding assembly codes are respectively:

```asm
movss xmm1, dword ptr [rcx + 4]
movss xmm0, dword ptr [rcx]
movaps xmm2, xmmword ptr [rip + .LCPI1_0]
pxor xmm0, xmm2
movss dword ptr [rax + 4], xmm1
movss dword ptr [rax], xmm0

```

and

```asm
movss xmm0, dword ptr [rcx]
movaps xmm1, xmmword ptr [rip + .LCPI1_0]
pxor xmm0, xmm1
movss dword ptr [rax], xmm0
movss xmm0, dword ptr [rcx]
movss xmm0, dword ptr [rcx + 4]
movss dword ptr [rax + 4], xmm0

```

The second code (associated with complex part designators) seems to do some redundant work here. Specifically, `movss xmm0,[rcx]` right before loading `[rcx+4]` seems to be redundant; it’s overwritten immediately and has no effect. Is this a missed compiler optimization, or are there reasons for such a difference between the two codes?
Notably, gfortran generates the same assembly for both codes.
Here are compiler explorer links for testing:

https://godbolt.org/z/nesWEfWzj

and

https://godbolt.org/z/1ax3q8bTs

 

0 Kudos
0 Replies
Reply