- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I noticed there is not user-level intrinsic ‘_mm512_sincos_ps’ or ‘_mm512_mask_sincos_ps’ defined in zmmintrin.h.
However, I have just found out that Intel compiler is emitting ‘__svml_sincosf16’ and ’ __svml_sincosf16_mask’ when it autovectorises code and finds ‘cos’ and ‘sin’ operations on the same value.
I have been doing some tests and if I define my own ‘_mm512_sincos_ps’ at user-space level, Intel compiler recognises it and translates it into the appropriate ‘__svml_sincosf16’. However, the result of my code is incorrect, maybe because the parameters of my function are not the expected.
Could anyone please tell me why ‘_mm512_sincos_ps’ has not been defined in zmmintrin.h and what the expected parameters are so that I can define it appropriately ?
Thank you in advance.
Best regards.
(Using icc 14.0.1)
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you for your reply!
I agree. If someone could move the post to the ISA forum that would be great. I wouldn't like to replicate the post.
I need to generate something similar to what Intel Compiler does, so I should use the '__svml_sincos16' function. Using sin and cos separately would have a significant impact in performance.
Let me show you this example:
[cpp]
#include <math.h>
#pragma omp declare simd
float __attribute__((noinline)) sin_cos(float a, float *cos)
{
float sin;
sin = sinf(a);
*cos = cosf(a);
return sin;
}
void main(int argc, char *argv[])
{
float cos[16];
float sin[16];
float input = atof(argv[1]);
int i;
#pragma omp simd
for (i=0; i<16; i++)
{
sin = sin_cos(input + i, &cos);
}
for (i=0; i<16; i++)
{
printf("%d: sin=%f, cos=%f\n", i, sin, cos);
}
}
[/cpp]
I compile it with: icc -fopenmp -O3 sincos.c -S -mmic
(note that I use MIC, not AVX2).
In the generated code, the main function call to the following function:
[cpp]
# -- Begin _ZGVMN16vv_sin_cos.U
# mark_begin;
# Threads 4
.align 16,0x90
.globl _ZGVMN16vv_sin_cos.U
_ZGVMN16vv_sin_cos.U:
# parameter 1: %zmm0
# parameter 2: %zmm1
# parameter 3: %zmm2
..B2.1: # Preds ..B2.0 Latency 17
pushq %rbp #5.1
movq %rsp, %rbp #5.1
andq $-64, %rsp #5.1
subq $320, %rsp #5.1 c1
vmovaps %zmm23, 64(%rsp) #5.1 c5
vmovaps %zmm1, %zmm23 #5.1 c9
vmovaps %zmm20, 128(%rsp) #5.1 c9
vmovaps %zmm2, %zmm20 #5.1 c13
call __svml_sincosf16 #8.11 c17
..B2.10: # Preds ..B2.1 Latency 29
vmovaps %zmm1, %zmm3 #8.11 c1
movl $255, %eax #9.6 c1
kmov %eax, %k1 #9.6 c5
movl $43690, %eax #9.6 c5
kmov %eax, %k2 #9.6 c9
movl $21845, %eax #9.6 c9
kmov %k5, %ecx ...
[/cpp]
I just wanted to know how could I call this '__svml_sincosf16' function from user-space.
As I said, if you declare a function "_m512_sincos_ps", Intel Compiler translates it to this "__svml_sincosf16". But the generated code is not correct :
[cpp]
#include <immintrin.h>
extern __m512 __ICL_INTRINCC _mm512_sincos_ps(__m512*, __m512);
__m512 __attribute__((noinline)) sin_cos(__m512 a, __m512 *cos)
{
__m512 sin;
sin = _mm512_sincos_ps(cos, a);
return sin;
}
void main(int argc, char *argv[])
{
float __attribute__((aligned(64))) cos[16];
float __attribute__((aligned(64))) sin[16];
__m512 * vsin = (__m512 *) &sin;
__m512 * vcos = (__m512 *) &cos;
__m512 input = _mm512_add_ps(_mm512_set1_ps(atof(argv[1])), _mm512_set_ps(15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0));
_mm512_store_ps(vsin, sin_cos(input, vcos));
int i;
for (i=0; i<16; i++)
{
printf("%d: sin=%f, cos=%f\n", i, sin, cos);
}
}
[/cpp]
[cpp]
sin_cos:
# parameter 1: %zmm0
# parameter 2: %rdi
..B2.1: # Preds ..B2.0 Latency 1
..___tag_value_sin_cos.19: #6.1
jmp __svml_sincosf16 #9.11 c1
.align 16,0x90
..___tag_value_sin_cos.21: #
# LOE
# mark_end;
[/cpp]
Any idea?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It should crash but the cosine is not computed.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The user maybe could call the functions __svml_sincosf16 and __svml_sincosf16_mask directly.
But it will be difficult to do from a program written in C, since these functions have a special interface.
These functions return the result to the two registers, that is very difficult (or impossible) to use the program in C.
Synopsis of these functions is something like the following:
(sin_res, cos_res) = __svml_sincosf16(source)
(sin_res, cos_res) = __svml_sincosf16_mask(sin_dest, cos_dest, mask, source)
Maybe you should try using the assembly code in your C program like the way compiler does in order to make the generated code correct.
Best, Qiao

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page