Showing results for 
Search instead for 
Did you mean: 

Is profiling information running on SDE accurate and trustable?


         I am trying to look at AVX 512 performance, currently, I wrote a simple function for evaluation as below, I configured the optimization and enabled AVX-512 etc in the project properties setting(vs2013 integrated with Intel-parallel-studio ), and I see the AVX-512 instructions are used from the asm files generated by compiler.

void complexVectorConjMpy(float *inputPtr1, float *inputPtr2, float *outputPtr, int numData)
    int idxData;
    float data1Re, data1Im, data2Re, data2Im;

    #pragma ivdep
    __assume_aligned(inputPtr1, 64);
    __assume_aligned(inputPtr2, 64);
    __assume_aligned(outputPtr, 64);
    for (idxData = 0; idxData < numData; idxData++)
        data1Re = inputPtr1[2 * idxData];
        data1Im = inputPtr1[2 * idxData + 1];
        data2Re = inputPtr2[2 * idxData];
        data2Im = inputPtr2[2 * idxData + 1];
        outputPtr[2 * idxData]     = data1Re * data2Re + data1Im * data2Im;
        outputPtr[2 * idxData + 1] = data1Im * data2Re - data1Re * data2Im;


         I used lib function as below from <intrin.h> and <time.h> to get time profiling information, by repeating calling the above function to profile for million times and doing an average to get accurate time/clock profiling.

unsigned __int64 GetTickAndTime(unsigned long long *getTick, double *getTime) 
    unsigned __int64 now;
    unsigned __int64 m_frequency;
    // Clock ticks
    //__rdtsc() is an intrinsic provided by Microsoft Visual Studio* in intrin.h header file
    *getTick = __rdtsc();

    // Time
    // QueryPerformanceFrequency works with QueryPerformanceCounter to return a human-readable time, provided in Windows.h
    QueryPerformanceFrequency((LARGE_INTEGER *)&m_frequency);
    QueryPerformanceCounter((LARGE_INTEGER *)&now);
    // Divide the raw counter by m_frequency for time in seconds
    *getTime = ((double)now) / ((double)m_frequency);
    return now;


    Intel® SDE( Software Development Emulator) has emulation support for the additional Intel® Advanced Vector Extensions 512 . So I profiling two sets of profiling information. One is based on local windows debugger of x64 and release mode, another is based on SDE debugger of x64 and release mode.

    But I see there is big difference there,  in the local window mode, printed elapsed time is around 0.0014 ms, in the SDE mode, printed elapsed time is around 0.032 ms. ( I run it on SDE with command on cmd by typing:  sde -- application)

     So my question is: is profiling information on SDE accurate? Or SDE is used for debugging?

     My PC core is  i5-3320M,  its frequency is 2.533G


Thank you


0 Kudos
1 Reply

Intel SDE is a functional emulator. It does not provide timing information. Timing something as short as you are attempting is also not a great idea for lots of reasons.