- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I am trying to look at AVX 512 performance, currently, I wrote a simple function for evaluation as below, I configured the optimization and enabled AVX-512 etc in the project properties setting(vs2013 integrated with Intel-parallel-studio ), and I see the AVX-512 instructions are used from the asm files generated by compiler.
void complexVectorConjMpy(float *inputPtr1, float *inputPtr2, float *outputPtr, int numData)
{
int idxData;
float data1Re, data1Im, data2Re, data2Im;
#pragma ivdep
__assume_aligned(inputPtr1, 64);
__assume_aligned(inputPtr2, 64);
__assume_aligned(outputPtr, 64);
for (idxData = 0; idxData < numData; idxData++)
{
data1Re = inputPtr1[2 * idxData];
data1Im = inputPtr1[2 * idxData + 1];
data2Re = inputPtr2[2 * idxData];
data2Im = inputPtr2[2 * idxData + 1];
outputPtr[2 * idxData] = data1Re * data2Re + data1Im * data2Im;
outputPtr[2 * idxData + 1] = data1Im * data2Re - data1Re * data2Im;
}
return;
}
I used lib function as below from <intrin.h> and <time.h> to get time profiling information, by repeating calling the above function to profile for million times and doing an average to get accurate time/clock profiling.
unsigned __int64 GetTickAndTime(unsigned long long *getTick, double *getTime)
{
unsigned __int64 now;
unsigned __int64 m_frequency;
// Clock ticks
//__rdtsc() is an intrinsic provided by Microsoft Visual Studio* in intrin.h header file
*getTick = __rdtsc();
// Time
// QueryPerformanceFrequency works with QueryPerformanceCounter to return a human-readable time, provided in Windows.h
QueryPerformanceFrequency((LARGE_INTEGER *)&m_frequency);
QueryPerformanceCounter((LARGE_INTEGER *)&now);
// Divide the raw counter by m_frequency for time in seconds
*getTime = ((double)now) / ((double)m_frequency);
return now;
}
Intel® SDE( Software Development Emulator) has emulation support for the additional Intel® Advanced Vector Extensions 512 . So I profiling two sets of profiling information. One is based on local windows debugger of x64 and release mode, another is based on SDE debugger of x64 and release mode.
But I see there is big difference there, in the local window mode, printed elapsed time is around 0.0014 ms, in the SDE mode, printed elapsed time is around 0.032 ms. ( I run it on SDE with command on cmd by typing: sde -- application)
So my question is: is profiling information on SDE accurate? Or SDE is used for debugging?
My PC core is i5-3320M, its frequency is 2.533G
Thank you
John
- Tags:
- Intel® Advanced Vector Extensions (Intel® AVX)
- Intel® Streaming SIMD Extensions
- Parallel Computing
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Intel SDE is a functional emulator. It does not provide timing information. Timing something as short as you are attempting is also not a great idea for lots of reasons.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page