I'm working on machines Intel core i5 6400 (2.7GHz, Windows 8.1, 8GB RAM) and Intel core i7-5930K (3.5 GHz, Windows 8.1, 32 GB RAM)
I need to build a roofline model for my application. For that purpose I use Intel SDE. I measure Arithmetic Intensity counting mem-read and mem-write and the total number of GFLOP counting elements_fp_<...>
I take the max bandwidth from HPC Performance Characterization test results (for my machine it shows to be 17GB/s).
However, when I profile mkl benchmark (matrix multiplication, sgemm function) I get different values of FLOP calculated by Intel Amplifier and Intel SDE. For Intel Amplifier 1027 the result is twice greater then for SDE. (It is not the case for STREAM benchmark, where I get approximately the same values!) Moreover, the arithmetic intensity calculated with SDE and GFLOPS calculated either by SDE or Amplifier give me the point which is well out of roofline model limits.
Is there any particular issues when using SDE for mkl library functions?
Could you please let me know if I'm using Intel SDE and Amplifier in the right way to estimate FLOPs, Arithmetic Intensity and max bandwidth?