Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.

how do I know MKL FFT being called used AVX-512

Wei_Z_Intel
Employee
1,506 Views

Hi 

        I am trying to look at MKL FFT performance by calling 5 lib functions as below, I configured the optimization and enabled AVX-512 etc in the project properties setting(vs2013 integrated with Intel-parallel-studio ).

        status = DftiCreateDescriptor(&DFT_desc, DFTI_SINGLE, DFTI_COMPLEX, 1, IDFTSize);

        status = DftiSetValue(DFT_desc, DFTI_PLACEMENT, DFTI_NOT_INPLACE);
        status = DftiCommitDescriptor(DFT_desc);

        status = DftiComputeForward(DFT_desc, IDFT_in_singlePrecision, IDFT_out_singlePrecision);

        status = DftiFreeDescriptor(&DFT_desc);

         I used lib function as below from <intrin.h> and <time.h> to get time profiling information, by repeating calling the above function(DftiComputeForward) for million times and doing an average to get accurate time/clock profiling.

unsigned __int64 GetTickAndTime(unsigned long long *getTick, double *getTime) 
{
    unsigned __int64 now;
    unsigned __int64 m_frequency;
    // Clock ticks
    //__rdtsc() is an intrinsic provided by Microsoft Visual Studio* in intrin.h header file
    *getTick = __rdtsc();

    // Time
    // QueryPerformanceFrequency works with QueryPerformanceCounter to return a human-readable time, provided in Windows.h
    
    QueryPerformanceFrequency((LARGE_INTEGER *)&m_frequency);
    
    QueryPerformanceCounter((LARGE_INTEGER *)&now);
    // Divide the raw counter by m_frequency for time in seconds
    *getTime = ((double)now) / ((double)m_frequency);
    return now;

}

      And I get below profiling information  based on local windows debugger of x64 and release mode,my PC core is  i5-3320M,  its frequency is 2.533G.  I used MKL version is w_mkl_11.2.1.148.exe, it says that it has FFT optimized for AVX-512.  

FFT Size single precision double precision
FFT: 128 points (16-bit, complex) 0.174 us 0.329 us
FFT: 256 points (16-bit, complex) 0.376 us 0.629 us
FFT: 512 points (16-bit, complex) 0.828 us 1.396 us
FFT: 1024 points (16-bit, complex) 1.631 us 3.338 us
FFT: 2048 points (16-bit, complex) 8.66 us 13.441 us

             So I want to check with you if the above FFT profiling information is based on AVX-512, or do I need to enable sth for that?   Where do I know if the called MKL FFT used AVX-512?

              By the way, I used some other simple general C code to check if its asm files generated by compiler used the AVX-512 instructions. And I see it used.

 

Thank you

John

             

0 Kudos
9 Replies
TimP
Honored Contributor III
1,507 Views
If you have pre-release hardware and software it would be under non-disclosure not to be discussed here. Mkl would choose instructions according to your platform. You could test avx512 with sde emulator but it would not perform as well as the native code for your platform.
0 Kudos
Wei_Z_Intel
Employee
1,507 Views

Hi Tim,

          Thank you for the fast replies.

           I don't have pre-release hardware. I used MKL release version: w_mkl_11.2.1.148, it says support AVX-512.

           I did run it also on SDE by typing command on cmd: sde -- applicaiton, and I saw indeed, timing on SDE increase for like 20 times. When you say that sde emulator  would not perform as well as the native code for your platform, do you mean SDE is usually for development/debugging, time profiling is not accurate and can not be used?

 

Thank you

John

0 Kudos
Evgueni_P_Intel
Employee
1,507 Views

Hi WEI Z.,

Please check you private messages.

Please use MKL_Get_Version to find out which CPU is detected by MKL.

Thank you.

Evgueni.

0 Kudos
TimP
Honored Contributor III
1,507 Views

The public SDE, which is all I had access to even when I worked at Intel, can't be used to evaluate performance beyond checking the extent to which the newer architecture may permit a reduction in instruction count.  As you said, it is primarily for development prior to availability of the hardware.

0 Kudos
Wei_Z_Intel
Employee
1,507 Views
Hi Tim/Evgueni, Thanks for the feedback. With the MKL_Get_Version , I got below information Major version: 11 Minor version: 2 Update version: 1 Product status: Product Build: 20141023 Platform: Intel(R) 64 architecture Processor optimization: Intel(R) Advanced Vector Extensions (Intel(R) AVX) enabled processors ================================================================ My PC core is i5-3320M, with above information, I understand AVX is enabled, but I'm not sure if AVX-512 is enabled, since there is previous AVX and AVX2 version. Should I judge this by judging if my core i5-3320M supports AVX-512? Thank you John
0 Kudos
Evgueni_P_Intel
Employee
1,507 Views

You may enable the AVX-512 code path in Intel MKL using the MKL_Enable_Instructions function.

You may need to specify the CPU type that you want to emulate with SDE -- please refer to the SDE documentation.

0 Kudos
Wei_Z_Intel
Employee
1,507 Views

Hi Evgueni,

            Thank you for the good help.

            Here I want to check MKL FFT's time performance with AVX-512 enabled, it looks to me that SDE can not be used to run for time profiling information, since SDE is part of a JIT, it emulates instructions not present on the host system with rather long sequences of older instructions. So between JIT overhead and emulation overhead, measuring wall clock time is meaningless for performance measurements.

             So I run MKL FFT on my local windows debugger with x64 and release mode in simulation mode, I tried you suggested function as below:

                mkl_enable_indicator = mkl_enable_instructions(MKL_ENABLE_AVX512_MIC);

            The return value mkl_enable_indicator is 1, from its description as below, it means that  Intel MKL usesAVX-512 instruction set if the hardware supports it.  Does it mean enable AVX-512 successfully? As my host pc core is core i5-3320M , should not support AVX-512 yet.

                 //Value reflecting usage status of the specified instruction set :
                //1 - Intel MKL uses the specified instruction set if the hardware supports it.
                //0 - The request is rejected.

 

Thank you

John

0 Kudos
Evgueni_P_Intel
Employee
1,507 Views

The purpose of mkl_enable_instructions is to switch on the optimizations specific to a selected ISA.

The return value simply indicates whether your version of Intel MKL contains optimizations specific to the ISA.

Whether these optimizations will be used or not, depends on the CPU features detected by MKL at runtime using the CPUID instruction.

 

0 Kudos
Wei_Z_Intel
Employee
1,507 Views

Thanks a lot for the clarification, Evgueni/Tim I understand now.

 

John

0 Kudos
Reply