Analyzers
Community support for Analyzers (Intel VTune™ Profiler, Intel Advisor, Intel Inspector)
Announcements
The Intel sign-in experience has changed to support enhanced security controls. If you sign in, click here for more information.
4821 Discussions

Metric for execution count per line

Apaar_G_
Beginner
221 Views

Hi All, 

I'm working on Optimizing a Fortran Application, and observing the effects on Vtune Amplifier XE 2013 along side. I'm experimenting with the precision of operations in one of lines in my code. 

For instance ,
abc(1:BATCH_SIZE) = exp (-r_arr(1:BATCH_SIZE) * (1.d0/(4.d0 * ri * r1_arr(outer_j:jend))))

contains all 64-bit operands. Now, if I downgrade the operands to 32-bit and re-write my expression as : 

abc(1:BATCH_SIZE) = exp (real(-r_arr(1:BATCH_SIZE)) * (1.0/(4.0 * real(ri) * real(r1_arr(outer_j:jend)))))    

I get some reduction in the CPU Time from the per line counter through VTune (both cases profiled for same elapsed time) . Also, another metric that should change (increase) is the execution count of this line, because the 32-bit code will execute quicker than 64-bit code.

Which metric should I be looking for through VTune for comparison? 

Regards,
Apaar

 

 

0 Kudos
3 Replies
TimP
Black Belt
221 Views

Guessing at your goal, I suppose you would set the same sampling rate and duration in your comparison runs and compare the number of samples taken.  That could be an estimate of relative execution rate.

Peter_W_Intel
Employee
221 Views

In my view - first at all, you might compare INST_RETIRED.ANY event counts of them to ensure they have save same (similar) workload, then use event CPU_CLK_UNHALTED.THREAD to know execution time; If event counts of INST_RETIRED.ANY are different, smaller one is better in algorithm (expression).  

Bernard
Black Belt
221 Views

As a addition to Peter's comment if you are interested in FP performance then you can  look at SIMD FP metrics and compare both version of the code. If your code was successfully vectorized you should look at number of events FP 64-bit packed and FP 32-bit packed:

FP_COMP_OPS_EXE.SSE_SINGLE_PRECISION

FP_COMP_OPS_EXE.SSE_DOUBLE_PRECISION

Next step will be CPI calculation and comparison between two versions of the code.

 

Reply