I am trying to look at AVX 512 performance, currently, I wrote a simple function for evaluation as below, I configured the optimization and enabled AVX-512 etc in the project properties setting(vs2013 integrated with Intel-parallel-studio ), and I see the AVX-512 instructions are used from the asm files generated by compiler.
void complexVectorConjMpy(float *inputPtr1, float *inputPtr2, float *outputPtr, int numData)
float data1Re, data1Im, data2Re, data2Im;
for (idxData = 0; idxData < numData; idxData++)
data1Re = inputPtr1[2 * idxData];
data1Im = inputPtr1[2 * idxData + 1];
data2Re = inputPtr2[2 * idxData];
data2Im = inputPtr2[2 * idxData + 1];
outputPtr[2 * idxData] = data1Re * data2Re + data1Im * data2Im;
outputPtr[2 * idxData + 1] = data1Im * data2Re - data1Re * data2Im;
I used lib function as below from <intrin.h> and <time.h> to get time profiling information, by repeating calling the above function to profile for million times and doing an average to get accurate time/clock profiling.
unsigned __int64 GetTickAndTime(unsigned long long *getTick, double *getTime)
unsigned __int64 now;
unsigned __int64 m_frequency;
// Clock ticks
//__rdtsc() is an intrinsic provided by Microsoft Visual Studio* in intrin.h header file
*getTick = __rdtsc();
// QueryPerformanceFrequency works with QueryPerformanceCounter to return a human-readable time, provided in Windows.h
// Divide the raw counter by m_frequency for time in seconds
*getTime = ((double)now) / ((double)m_frequency);
(has emulation support for the additional IntelAdvanced Vector Extensions 512 . So I profiling two sets of profiling information. One is based on local windows debugger of x64 and release mode, another is based on SDE debugger of x64 and release mode.
But I see there is big difference there, in the local window mode, printed elapsed time is around 0.0014 ms, in the SDE mode, printed elapsed time is around 0.032 ms. ( I run it on SDE with command on cmd by typing: sde -- application)
So my question is: is profiling information on SDE accurate? Or SDE is used for debugging?
My PC core is i5-3320M, its frequency is 2.533G
Intel SDE is a functional emulator. It does not provide timing information. Timing something as short as you are attempting is also not a great idea for lots of reasons.