Intel® C++ Compiler
Community support and assistance for creating C++ code that runs on platforms based on Intel® processors.
7956 Discussions

ICPX auto-vectorization of math functions

AFog
Beginner
1,937 Views

Auto-vectorization of loops with math functions works well with ICPX under Linux, but not under Windows.

 

The Windows version has only double precision vector math functions, and no bigger than 256 bits. The Linux version has everything.

 

// example
const int size = 256;
float a[size];
float b[size];
...
for (int i=0; i<size; i++) {
    b[i] = exp(a[i]);
}

ICPX under Linux is using a 512-bit single precision function in the SVML library (__svml_expf16).

ICPX under Windows is converting to double and using a 256-bit double precision function in the SVML library (__svml_exp4).

__svml_expf16 is actually present in svml_dispmt.lib and can be called as _mm512_exp_ps.

 

0 Kudos
13 Replies
HemanthCH_Intel
Moderator
1,908 Views

Hi,

 

Thanks for posting in Intel Communities.

 

Could you please provide us with the following information to investigate more on your issue?

  1. The complete reproducer code and steps you have followed on Linux and Windows machines to reproduce your issue at our end?
  2. please confirm whether you are using Command prompt or Visual studio for running your code in Windows?
  3. How you are identifying "ICPX under Linux is using a 512-bit single precision function in the SVML library (__svml_expf16).and ICPX under Windows is converting to double and using a 256-bit double precision function in the SVML library (__svml_exp4)."
  4. Are you using any intrinsic in your code?
  5. And also, could you please let us know the icpx version?

 

Thanks & Regards,

Hemanth

 

0 Kudos
AFog
Beginner
1,897 Views

Thank you for your reply.

Steps to reproduce: Compile the below code in Visual Studio 2022.1, release mode, /arch:AVX512, /fp:fast.
(Intel® oneAPI DPC++ Compiler Package ID: w_oneAPI_2022.1.0.256, Intel® oneAPI DPC++ Compiler – toolkit version: 2022.2.0, extension version 22.0.0.17, Package ID: w_oneAPI_2022.1.0.256).

 

#include <immintrin.h>
#include <inttypes.h>
#include <stdio.h>
#include <math.h>

const int size = 256;
float aaaa[size] = {0};
float bbbb[size] = {0};
volatile int k = 1;

int main () {
    // prevent optimizing whole loop away:
    aaaa[k] = 5.64;
    for (int i=0; i<size; i++) {
        bbbb[i] = exp(aaaa[i]);    
    }
    // prevent optimizing whole loop away:
    aaaa[k] = bbbb[k+1];
    for (int i=0; i<size; i++) {
        bbbb[i] = cos(aaaa[i]);     
    }
    printf("\ncos(0) = %f\n", bbbb[0]);
    return int(bbbb[k]);
}

 

 The debugger shows this disassembly:

 

        bbbb[i] = exp(aaaa[i]);    
00007FF714E5BE2E  vcvtps2pd   zmm16,ymmword ptr [aaaa+20h (07FF714E94720h)]  
00007FF714E5BE38  vcvtps2pd   zmm0,ymmword ptr [aaaa (07FF714E94700h)]  
00007FF714E5BE42  call        __svml_exp8_z0 (07FF714E5C7F0h)  
00007FF714E5BE47  vmovaps     zmm17,zmm0  
00007FF714E5BE4D  vmovaps     zmm0,zmm16  
00007FF714E5BE53  call        __svml_exp8_z0 (07FF714E5C7F0h)  
00007FF714E5BE58  vcvtpd2ps   ymm1,zmm17  
00007FF714E5BE5E  vcvtpd2ps   ymm0,zmm0  
00007FF714E5BE64  vinsertf64x4 zmm0,zmm1,ymm0,1  
00007FF714E5BE6B  vmovaps     zmmword ptr [bbbb (07FF714E94B00h)],zmm0  

After further testing, I found that the problem is solved when I change exp to expf and cos to cosf. So the diagnosis is not poor auto-vectorization, but that it fails to optimize exp to expf and cos to cosf in the Windows version, even under /fp:fast

 

 

 

0 Kudos
HemanthCH_Intel
Moderator
1,851 Views

Hi,


We are working on this internally and will get back to you soon.


Thanks & Regards,

Hemanth


0 Kudos
Viet_H_Intel
Moderator
1,826 Views

>> ICPX under Linux is using a 512-bit single precision function in the SVML library (__svml_expf16)

I tried your test case on Linux with icpx (Version 2022.1.0 Build 20220316) but couldn't see __svml_expf16 being called.

$ icpx -march=common-avx512 -fp-model=fast c05540193.cpp

$ objdump -d a.out |grep svml_expf16

$

>> ICPX under Windows is converting to double and using a 256-bit double precision function in the SVML library (__svml_exp4)

I tried the test case on Windows, but didn't see __svml_ex4 being called at all.

> icx /arch:AVX512 /fp:fast c05540193.cpp /O2

Intel(R) oneAPI DPC++/C++ Compiler for applications running on Intel(R) 64, Version 2022.1.0 Build 20220316

> dumpbin /DISASM c05540193.exe >dump.txt

> notepad dump.txt (no matches found for svml). Attached is a dump.txt


Can you provide steps to reproduce what you have observed?




0 Kudos
Viet_H_Intel
Moderator
1,819 Views
0 Kudos
Viet_H_Intel
Moderator
1,772 Views

Please provide us instructions and test case to reproduce the issue.

Thanks,


0 Kudos
AFog
Beginner
1,763 Views

Viet H.

Your own dump shows the same as mine. It is converting single to double precision at vcvtps2pd, then it is calling the double version of the vector function at call 00000001400019E0. Then converting the result back to single precision at vcvtpd2ps. This shows, as I wrote, that it fails to optimize exp to expf and cos to cosf, but it does vectorize. Your dump does not show the function names.

0 Kudos
Viet_H_Intel
Moderator
1,755 Views

Looks like the Auto-vectorization is not an issue anymore.

Does your assembly dump show the function names? Can you attach the assembly file and point it out where it fails to optimize exp to expf and cos to cosf?


Thanks,


0 Kudos
AFog
Beginner
1,742 Views

It is seen in your own dump and in the disassembly that I have posted above. It converts single precision to double precision (vcvtps2pd) before calling the double-precision version of the vector function (__svml_exp8_z0), then converts the result back to single precision (vcvtpd2ps).

0 Kudos
Viet_H_Intel
Moderator
1,711 Views

Thanks, I'll look into this.


0 Kudos
Viet_H_Intel
Moderator
689 Views

Does this workaround, change the exp(), cos() to expf() cosf() which are the correct math library for float, work for you?

Thanks,




0 Kudos
Viet_H_Intel
Moderator
322 Views

Would you let us know if the workaround works for you? So that we can close this issue?

Thanks,


0 Kudos
Viet_H_Intel
Moderator
274 Views

We haven't heard from you and since there is workaround, we will no longer respond to this thread. If you require additional assistance from Intel, please start a new thread. Any further interaction in this thread will be considered community only.


0 Kudos
Reply