I am trying to estimate the number of FLOP and FLOPS for my application by using hardware EBS running from the command line. I have implemented the __itt_pause() and __itt_resume() around my algorithm of interest. I run this command:
C:/Program Files (x86)/Intel/VTune Amplifier XE 2015/bin64/amplxe-cl.exe -collect-with runsa -knob event-config=FP_COMP_OPS_EXE.X87:sa=2000000 -start-paused --result-dir foo application.exe
From this article https://software.intel.com/en-us/articles/estimating-flops-using-event-based-sampling-ebs the suggested way to calculate the elapsed time is to use CPU_CLK_UNHALTED.THREAD and divide with processor frequency and # of cores, however when I run the hotspots analysis I get the CPU time (I assume for the entire application) which should be a sufficient approximation for the elapsed time around my algorithm, is it possible to include a "CPU TIME" measurement by simply adding an extra argument to my command above?
Thanks for any help,
Yes, it is possible please try something like:
C:/Program Files (x86)/Intel/VTune Amplifier XE 2015/bin64/amplxe-cl.exe -collect-with runsa -knob event-config=CPU_CLK_UNHALTED.REF_TSC:sa=2000003,FP_COMP_OPS_EXE.X87:sa=2000000 -start-paused --result-dir foo application.exe
I think we use CPU_CLK_UNHALTED.REF_TSC for CPU time calculation for processor architectures from Sandy Bridge.
Then you should see CPU time in summary report and hotspots report or GUI viewpoint should be available.
Thanks & Regards, Dmitry
Thanks Dmitry, it works on my Xeon E5-2620 which I believe is Ivy bridge. It does not work however on my Xeon E5630 ("Westmere/Nehalem-C"?), I assume it's too old.