I am trying to look at MKL FFT performance by calling 5 lib functions as below, I configured the optimization and enabled AVX-512 etc in the project properties setting(vs2013 integrated with Intel-parallel-studio ).
status = DftiCreateDescriptor(&DFT_desc, DFTI_SINGLE, DFTI_COMPLEX, 1, IDFTSize);
status = DftiSetValue(DFT_desc, DFTI_PLACEMENT, DFTI_NOT_INPLACE);
status = DftiCommitDescriptor(DFT_desc);
status = DftiComputeForward(DFT_desc, IDFT_in_singlePrecision, IDFT_out_singlePrecision);
status = DftiFreeDescriptor(&DFT_desc);
I used lib function as below from <intrin.h> and <time.h> to get time profiling information, by repeating calling the above function(DftiComputeForward) for million times and doing an average to get accurate time/clock profiling.
unsigned __int64 GetTickAndTime(unsigned long long *getTick, double *getTime)
unsigned __int64 now;
unsigned __int64 m_frequency;
// Clock ticks
//__rdtsc() is an intrinsic provided by Microsoft Visual Studio* in intrin.h header file
*getTick = __rdtsc();
// QueryPerformanceFrequency works with QueryPerformanceCounter to return a human-readable time, provided in Windows.h
// Divide the raw counter by m_frequency for time in seconds
*getTime = ((double)now) / ((double)m_frequency);
And I get below profiling information based on local windows debugger of x64 and release mode,my PC core is i5-3320M, its frequency is 2.533G. I used MKL version is w_mkl_188.8.131.52.exe, it says that it has FFT optimized for AVX-512.
|FFT Size||single precision||double precision|
|FFT: 128 points (16-bit, complex)||0.174 us||0.329 us|
|FFT: 256 points (16-bit, complex)||0.376 us||0.629 us|
|FFT: 512 points (16-bit, complex)||0.828 us||1.396 us|
|FFT: 1024 points (16-bit, complex)||1.631 us||3.338 us|
|FFT: 2048 points (16-bit, complex)||8.66 us||13.441 us|
So I want to check with you if the above FFT profiling information is based on AVX-512, or do I need to enable sth for that? Where do I know if the called MKL FFT used AVX-512?
By the way, I used some other simple general C code to check if its asm files generated by compiler used the AVX-512 instructions. And I see it used.
Thank you for the fast replies.
I don't have pre-release hardware. I used MKL release version: w_mkl_184.108.40.206, it says support AVX-512.
I did run it also on SDE by typing command on cmd: sde -- applicaiton, and I saw indeed, timing on SDE increase for like 20 times. When you say that sde emulator would not perform as well as the native code for your platform, do you mean SDE is usually for development/debugging, time profiling is not accurate and can not be used?
The public SDE, which is all I had access to even when I worked at Intel, can't be used to evaluate performance beyond checking the extent to which the newer architecture may permit a reduction in instruction count. As you said, it is primarily for development prior to availability of the hardware.
You may enable the AVX-512 code path in Intel MKL using the MKL_Enable_Instructions function.
You may need to specify the CPU type that you want to emulate with SDE -- please refer to the SDE documentation.
Thank you for the good help.
Here I want to check MKL FFT's time performance with AVX-512 enabled, it looks to me that SDE can not be used to run for time profiling information, since SDE is part of a JIT, it emulates instructions not present on the host system with rather long sequences of older instructions. So between JIT overhead and emulation overhead, measuring wall clock time is meaningless for performance measurements.
So I run MKL FFT on my local windows debugger with x64 and release mode in simulation mode, I tried you suggested function as below:
mkl_enable_indicator = mkl_enable_instructions(MKL_ENABLE_AVX512_MIC);
The return value mkl_enable_indicator is 1, from its description as below, it means that Intel MKL usesAVX-512 instruction set if the hardware supports it. Does it mean enable AVX-512 successfully? As my host pc core is core i5-3320M , should not support AVX-512 yet.
//Value reflecting usage status of the specified instruction set :
//1 - Intel MKL uses the specified instruction set if the hardware supports it.
//0 - The request is rejected.
The purpose of mkl_enable_instructions is to switch on the optimizations specific to a selected ISA.
The return value simply indicates whether your version of Intel MKL contains optimizations specific to the ISA.
Whether these optimizations will be used or not, depends on the CPU features detected by MKL at runtime using the CPUID instruction.