Intel® oneAPI DPC++/C++ Compiler
Talk to fellow users of Intel® oneAPI DPC++/C++ Compiler and companion tools like Intel® oneAPI DPC++ Library, Intel® DPC++ Compatibility Tool, and Intel® Distribution for GDB*

FP16 math functions not getting inlined

rabbits
Beginner
986 Views

 

icx 2024.2.0

-Ofast -xSAPPHIRERAPIDS

 

 

 

#include <immintrin.h>
#include <mathimf.h>

_Float16 test_fp16(_Float16 x, _Float16 y)
{
    return sqrtf16(x) + sqrtf16(y);
}

 

 

generates the following code:

 

 

 

test_fp16(_Float16, _Float16):
        push    rax
        vmovsh  word ptr [rsp + 4], xmm1
        call    sqrtf16
        vmovsh  word ptr [rsp], xmm0
        vmovsh  xmm0, word ptr [rsp + 4]
        call    sqrtf16
        vaddsh  xmm0, xmm0, word ptr [rsp]
        pop     rax
        ret

 

 

Is there a compiler flag I can use to make it generate the expected code instead?

 

 

 

test_fp16(_Float16, _Float16):
        vsqrtsh xmm0, xmm0, xmm0
        vsqrtsh xmm1, xmm1, xmm1
        vaddsh  xmm0, xmm1, xmm0
        ret

 

 

See https://godbolt.org/z/xWsqW8djE for more details.

 

 

0 Kudos
5 Replies
Alex_Y_Intel
Moderator
916 Views

I've escalated your issue to our internal team for further investigation, thanks. 

0 Kudos
Alex_Y_Intel
Moderator
891 Views

In order for a function to be inlined, it has to have the header/source file when compiling. If you take a look at the "inline report" with: icx test.c -Ofast -xSAPPHIRERAPIDS -c -qopt-report=3 -qopt-report-file=stderr


Global optimization report for : test_fp32

=================================================================


Global optimization report for : test_fp16

=================================================================


Global optimization report for : test_fp16_fix

=================================================================


---- Begin Inlining Report ----

Option Values:

 inline-threshold: 225

 inlinehint-threshold: 325

 inlinecold-threshold: 45

 inlineoptsize-threshold: 15


DEAD STATIC FUNC: _mm_set_sh


DEAD STATIC FUNC: _mm_setzero_ph


DEAD STATIC FUNC: _mm_sqrt_sh


DEAD STATIC FUNC: sqrtf16_fast


COMPILE FUNC: test_fp32

  -> llvm.sqrt.f32 test.c (6,12)

  -> llvm.sqrt.f32 test.c (6,23)


COMPILE FUNC: test_fp16

  -> EXTERN: sqrtf16 test.c (11,12)

  -> EXTERN: sqrtf16 test.c (11,25)


COMPILE FUNC: test_fp16_fix

  -> INLINE: sqrtf16_fast test.c (21,12) (-30<=487)

   -> INLINE: _mm_sqrt_sh test.c (16,12)

     -> INLINE: _mm_setzero_ph test.c

     -> llvm.sqrt.f16 test.c

   -> INLINE: _mm_setzero_ph test.c (16,24)

   -> INLINE: _mm_set_sh test.c (16,42)

  -> INLINE: sqrtf16_fast test.c (21,30) (-30<=487)

   -> INLINE: _mm_sqrt_sh test.c (16,12)

     -> INLINE: _mm_setzero_ph test.c

     -> llvm.sqrt.f16 test.c

   -> INLINE: _mm_setzero_ph test.c (16,24)

   -> INLINE: _mm_set_sh test.c (16,42)


---- End Inlining Report ------


Please note that sqrtf16 uses an external function:


COMPILE FUNC: test_fp16

  -> EXTERN: sqrtf16 test.c (11,12)

  -> EXTERN: sqrtf16 test.c (11,25)


In fact, this sqrtf16 is actually extracted from libimf.a and linked by the linker, and libimf.a is a library object, not a header/source file.


In conclusion, icx does not provide the source code for sqrtf16 to be inlined.


0 Kudos
rabbits
Beginner
871 Views

> In conclusion, icx does not provide the source code for sqrtf16 to be inlined.

swell!

 

 

0 Kudos
Alex_Y_Intel
Moderator
820 Views

I'm discussing this problem with another team and trying to find if we have other solutions. 

0 Kudos
rabbits
Beginner
793 Views

Let me turn Gordon in his grave for you.

0 Kudos
Reply