Intel® ISA Extensions
Use hardware-based isolation and memory encryption to provide more code protection in your solutions.
Announcements
FPGA community forums and blogs on community.intel.com are migrating to the new Altera Community and are read-only. For urgent support needs during this transition, please visit the FPGA Design Resources page or contact an Altera Authorized Distributor.

Is profiling information running on SDE accurate and trustable?

Wei_Z_Intel
Employee
766 Views

Hi,

         I am trying to look at AVX 512 performance, currently, I wrote a simple function for evaluation as below, I configured the optimization and enabled AVX-512 etc in the project properties setting(vs2013 integrated with Intel-parallel-studio ), and I see the AVX-512 instructions are used from the asm files generated by compiler.

void complexVectorConjMpy(float *inputPtr1, float *inputPtr2, float *outputPtr, int numData)
{
    int idxData;
    float data1Re, data1Im, data2Re, data2Im;

    #pragma ivdep
    __assume_aligned(inputPtr1, 64);
    __assume_aligned(inputPtr2, 64);
    __assume_aligned(outputPtr, 64);
     
    for (idxData = 0; idxData < numData; idxData++)
    {
        data1Re = inputPtr1[2 * idxData];
        data1Im = inputPtr1[2 * idxData + 1];
        data2Re = inputPtr2[2 * idxData];
        data2Im = inputPtr2[2 * idxData + 1];
        
        outputPtr[2 * idxData]     = data1Re * data2Re + data1Im * data2Im;
        outputPtr[2 * idxData + 1] = data1Im * data2Re - data1Re * data2Im;
    }

    return;
}

         I used lib function as below from <intrin.h> and <time.h> to get time profiling information, by repeating calling the above function to profile for million times and doing an average to get accurate time/clock profiling.

unsigned __int64 GetTickAndTime(unsigned long long *getTick, double *getTime) 
{
    unsigned __int64 now;
    unsigned __int64 m_frequency;
    // Clock ticks
    //__rdtsc() is an intrinsic provided by Microsoft Visual Studio* in intrin.h header file
    *getTick = __rdtsc();

    // Time
    // QueryPerformanceFrequency works with QueryPerformanceCounter to return a human-readable time, provided in Windows.h
    
    QueryPerformanceFrequency((LARGE_INTEGER *)&m_frequency);
    
    QueryPerformanceCounter((LARGE_INTEGER *)&now);
    // Divide the raw counter by m_frequency for time in seconds
    *getTime = ((double)now) / ((double)m_frequency);
    return now;

}

    Intel® SDE( Software Development Emulator) has emulation support for the additional Intel® Advanced Vector Extensions 512 . So I profiling two sets of profiling information. One is based on local windows debugger of x64 and release mode, another is based on SDE debugger of x64 and release mode.

    But I see there is big difference there,  in the local window mode, printed elapsed time is around 0.0014 ms, in the SDE mode, printed elapsed time is around 0.032 ms. ( I run it on SDE with command on cmd by typing:  sde -- application)

     So my question is: is profiling information on SDE accurate? Or SDE is used for debugging?

     My PC core is  i5-3320M,  its frequency is 2.533G

 

Thank you

John

0 Kudos
1 Reply
MarkC_Intel
Moderator
766 Views

Intel SDE is a functional emulator. It does not provide timing information. Timing something as short as you are attempting is also not a great idea for lots of reasons.

0 Kudos
Reply