Analyzers
Talk to fellow users of Intel Analyzer tools (Intel VTune™ Profiler, Intel Advisor)

Metric for execution count per line

Apaar_G_
Beginner
465 Views

Hi All, 

I'm working on Optimizing a Fortran Application, and observing the effects on Vtune Amplifier XE 2013 along side. I'm experimenting with the precision of operations in one of lines in my code. 

For instance ,
abc(1:BATCH_SIZE) = exp (-r_arr(1:BATCH_SIZE) * (1.d0/(4.d0 * ri * r1_arr(outer_j:jend))))

contains all 64-bit operands. Now, if I downgrade the operands to 32-bit and re-write my expression as : 

abc(1:BATCH_SIZE) = exp (real(-r_arr(1:BATCH_SIZE)) * (1.0/(4.0 * real(ri) * real(r1_arr(outer_j:jend)))))    

I get some reduction in the CPU Time from the per line counter through VTune (both cases profiled for same elapsed time) . Also, another metric that should change (increase) is the execution count of this line, because the 32-bit code will execute quicker than 64-bit code.

Which metric should I be looking for through VTune for comparison? 

Regards,
Apaar

 

 

0 Kudos
3 Replies
TimP
Honored Contributor III
465 Views

Guessing at your goal, I suppose you would set the same sampling rate and duration in your comparison runs and compare the number of samples taken.  That could be an estimate of relative execution rate.

0 Kudos
Peter_W_Intel
Employee
465 Views

In my view - first at all, you might compare INST_RETIRED.ANY event counts of them to ensure they have save same (similar) workload, then use event CPU_CLK_UNHALTED.THREAD to know execution time; If event counts of INST_RETIRED.ANY are different, smaller one is better in algorithm (expression).  

0 Kudos
Bernard
Valued Contributor I
465 Views

As a addition to Peter's comment if you are interested in FP performance then you can  look at SIMD FP metrics and compare both version of the code. If your code was successfully vectorized you should look at number of events FP 64-bit packed and FP 32-bit packed:

FP_COMP_OPS_EXE.SSE_SINGLE_PRECISION

FP_COMP_OPS_EXE.SSE_DOUBLE_PRECISION

Next step will be CPI calculation and comparison between two versions of the code.

 

0 Kudos
Reply