- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
icx 2024.2.0
-Ofast -xSAPPHIRERAPIDS
#include <immintrin.h>
#include <mathimf.h>
_Float16 test_fp16(_Float16 x, _Float16 y)
{
return sqrtf16(x) + sqrtf16(y);
}
generates the following code:
test_fp16(_Float16, _Float16):
push rax
vmovsh word ptr [rsp + 4], xmm1
call sqrtf16
vmovsh word ptr [rsp], xmm0
vmovsh xmm0, word ptr [rsp + 4]
call sqrtf16
vaddsh xmm0, xmm0, word ptr [rsp]
pop rax
ret
Is there a compiler flag I can use to make it generate the expected code instead?
test_fp16(_Float16, _Float16):
vsqrtsh xmm0, xmm0, xmm0
vsqrtsh xmm1, xmm1, xmm1
vaddsh xmm0, xmm1, xmm0
ret
See https://godbolt.org/z/xWsqW8djE for more details.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I've escalated your issue to our internal team for further investigation, thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In order for a function to be inlined, it has to have the header/source file when compiling. If you take a look at the "inline report" with: icx test.c -Ofast -xSAPPHIRERAPIDS -c -qopt-report=3 -qopt-report-file=stderr
Global optimization report for : test_fp32
=================================================================
Global optimization report for : test_fp16
=================================================================
Global optimization report for : test_fp16_fix
=================================================================
---- Begin Inlining Report ----
Option Values:
inline-threshold: 225
inlinehint-threshold: 325
inlinecold-threshold: 45
inlineoptsize-threshold: 15
DEAD STATIC FUNC: _mm_set_sh
DEAD STATIC FUNC: _mm_setzero_ph
DEAD STATIC FUNC: _mm_sqrt_sh
DEAD STATIC FUNC: sqrtf16_fast
COMPILE FUNC: test_fp32
-> llvm.sqrt.f32 test.c (6,12)
-> llvm.sqrt.f32 test.c (6,23)
COMPILE FUNC: test_fp16
-> EXTERN: sqrtf16 test.c (11,12)
-> EXTERN: sqrtf16 test.c (11,25)
COMPILE FUNC: test_fp16_fix
-> INLINE: sqrtf16_fast test.c (21,12) (-30<=487)
-> INLINE: _mm_sqrt_sh test.c (16,12)
-> INLINE: _mm_setzero_ph test.c
-> llvm.sqrt.f16 test.c
-> INLINE: _mm_setzero_ph test.c (16,24)
-> INLINE: _mm_set_sh test.c (16,42)
-> INLINE: sqrtf16_fast test.c (21,30) (-30<=487)
-> INLINE: _mm_sqrt_sh test.c (16,12)
-> INLINE: _mm_setzero_ph test.c
-> llvm.sqrt.f16 test.c
-> INLINE: _mm_setzero_ph test.c (16,24)
-> INLINE: _mm_set_sh test.c (16,42)
---- End Inlining Report ------
Please note that sqrtf16 uses an external function:
COMPILE FUNC: test_fp16
-> EXTERN: sqrtf16 test.c (11,12)
-> EXTERN: sqrtf16 test.c (11,25)
In fact, this sqrtf16 is actually extracted from libimf.a and linked by the linker, and libimf.a is a library object, not a header/source file.
In conclusion, icx does not provide the source code for sqrtf16 to be inlined.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
> In conclusion, icx does not provide the source code for sqrtf16 to be inlined.
swell!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm discussing this problem with another team and trying to find if we have other solutions.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Let me turn Gordon in his grave for you.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page