I want to get the FLOPS (amount of computation) of my application using SDE(Intel Software Development Emulator) , I read through the guide information on the page https://software.intel.com/en-us/articles/intel-software-development-emulator?page=1#BASIC,and then I run sde -mix -- myapplication.exe; When I open the output file ,there are two many numbers;so my question is that how to get the total FLOPS of my application?
As it says in the doc, should you wish to accept the default choice as to which are floating point instructions, you find the section of the tabulation
*elements_fp_single_1 36322323 *elements_fp_single_4 46364564 *elements_fp_double_1 31149
so by multiplying the 4-width parallel instruction count by 4 you get the total count of about 2.2*10^7. Then, (timing the application when run outside SDE) you divide by the number of seconds elapsed time. If, for example, it took 0.01 seconds, the average rate would be about 2.2 Gflops. Evidently, with such a short run, it will be spending a large fraction of the time on slow startup operations,, even though your numbers show an excellent ratio of > 80% vectorized with 128-bit width instructions. You should see a large improvement if you change to AVX or AVX2 build. Even with a run as long as is practical under SDE, you would likely wish to follow the advice on timing relevant portions of your application.
we have created a Python script to automatize counting FLOPS with the Intel SDE tool. It also supports markers to select sections of interest in your application. You can find it here:
It includes documentation and examples for C/C++, Fortran and Python applications, as of now.
We welcome feedback and are looking forward to extend it (e.g. to support other languages).