Intel® oneAPI DPC++/C++ Compiler
Talk to fellow users of Intel® oneAPI DPC++/C++ Compiler and companion tools like Intel® oneAPI DPC++ Library, Intel® DPC++ Compatibility Tool, and Intel® Distribution for GDB*
Announcements
FPGA community forums and blogs on community.intel.com are migrating to the new Altera Community and are read-only. For urgent support needs during this transition, please visit the FPGA Design Resources page or contact an Altera Authorized Distributor.

FP16 math functions not getting inlined

rabbits
Beginner
1,050 Views

 

icx 2024.2.0

-Ofast -xSAPPHIRERAPIDS

 

 

 

#include <immintrin.h>
#include <mathimf.h>

_Float16 test_fp16(_Float16 x, _Float16 y)
{
    return sqrtf16(x) + sqrtf16(y);
}

 

 

generates the following code:

 

 

 

test_fp16(_Float16, _Float16):
        push    rax
        vmovsh  word ptr [rsp + 4], xmm1
        call    sqrtf16
        vmovsh  word ptr [rsp], xmm0
        vmovsh  xmm0, word ptr [rsp + 4]
        call    sqrtf16
        vaddsh  xmm0, xmm0, word ptr [rsp]
        pop     rax
        ret

 

 

Is there a compiler flag I can use to make it generate the expected code instead?

 

 

 

test_fp16(_Float16, _Float16):
        vsqrtsh xmm0, xmm0, xmm0
        vsqrtsh xmm1, xmm1, xmm1
        vaddsh  xmm0, xmm1, xmm0
        ret

 

 

See https://godbolt.org/z/xWsqW8djE for more details.

 

 

0 Kudos
5 Replies
Alex_Y_Intel
Moderator
980 Views

I've escalated your issue to our internal team for further investigation, thanks. 

0 Kudos
Alex_Y_Intel
Moderator
955 Views

In order for a function to be inlined, it has to have the header/source file when compiling. If you take a look at the "inline report" with: icx test.c -Ofast -xSAPPHIRERAPIDS -c -qopt-report=3 -qopt-report-file=stderr


Global optimization report for : test_fp32

=================================================================


Global optimization report for : test_fp16

=================================================================


Global optimization report for : test_fp16_fix

=================================================================


---- Begin Inlining Report ----

Option Values:

 inline-threshold: 225

 inlinehint-threshold: 325

 inlinecold-threshold: 45

 inlineoptsize-threshold: 15


DEAD STATIC FUNC: _mm_set_sh


DEAD STATIC FUNC: _mm_setzero_ph


DEAD STATIC FUNC: _mm_sqrt_sh


DEAD STATIC FUNC: sqrtf16_fast


COMPILE FUNC: test_fp32

  -> llvm.sqrt.f32 test.c (6,12)

  -> llvm.sqrt.f32 test.c (6,23)


COMPILE FUNC: test_fp16

  -> EXTERN: sqrtf16 test.c (11,12)

  -> EXTERN: sqrtf16 test.c (11,25)


COMPILE FUNC: test_fp16_fix

  -> INLINE: sqrtf16_fast test.c (21,12) (-30<=487)

   -> INLINE: _mm_sqrt_sh test.c (16,12)

     -> INLINE: _mm_setzero_ph test.c

     -> llvm.sqrt.f16 test.c

   -> INLINE: _mm_setzero_ph test.c (16,24)

   -> INLINE: _mm_set_sh test.c (16,42)

  -> INLINE: sqrtf16_fast test.c (21,30) (-30<=487)

   -> INLINE: _mm_sqrt_sh test.c (16,12)

     -> INLINE: _mm_setzero_ph test.c

     -> llvm.sqrt.f16 test.c

   -> INLINE: _mm_setzero_ph test.c (16,24)

   -> INLINE: _mm_set_sh test.c (16,42)


---- End Inlining Report ------


Please note that sqrtf16 uses an external function:


COMPILE FUNC: test_fp16

  -> EXTERN: sqrtf16 test.c (11,12)

  -> EXTERN: sqrtf16 test.c (11,25)


In fact, this sqrtf16 is actually extracted from libimf.a and linked by the linker, and libimf.a is a library object, not a header/source file.


In conclusion, icx does not provide the source code for sqrtf16 to be inlined.


0 Kudos
rabbits
Beginner
935 Views

> In conclusion, icx does not provide the source code for sqrtf16 to be inlined.

swell!

 

 

0 Kudos
Alex_Y_Intel
Moderator
884 Views

I'm discussing this problem with another team and trying to find if we have other solutions. 

0 Kudos
rabbits
Beginner
857 Views

Let me turn Gordon in his grave for you.

0 Kudos
Reply