- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi All,
I'm working on Optimizing a Fortran Application, and observing the effects on Vtune Amplifier XE 2013 along side. I'm experimenting with the precision of operations in one of lines in my code.
For instance ,
abc(1:BATCH_SIZE) = exp (-r_arr(1:BATCH_SIZE) * (1.d0/(4.d0 * ri * r1_arr(outer_j:jend))))
contains all 64-bit operands. Now, if I downgrade the operands to 32-bit and re-write my expression as :
abc(1:BATCH_SIZE) = exp (real(-r_arr(1:BATCH_SIZE)) * (1.0/(4.0 * real(ri) * real(r1_arr(outer_j:jend)))))
I get some reduction in the CPU Time from the per line counter through VTune (both cases profiled for same elapsed time) . Also, another metric that should change (increase) is the execution count of this line, because the 32-bit code will execute quicker than 64-bit code.
Which metric should I be looking for through VTune for comparison?
Regards,
Apaar
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Guessing at your goal, I suppose you would set the same sampling rate and duration in your comparison runs and compare the number of samples taken. That could be an estimate of relative execution rate.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In my view - first at all, you might compare INST_RETIRED.ANY event counts of them to ensure they have save same (similar) workload, then use event CPU_CLK_UNHALTED.THREAD to know execution time; If event counts of INST_RETIRED.ANY are different, smaller one is better in algorithm (expression).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
As a addition to Peter's comment if you are interested in FP performance then you can look at SIMD FP metrics and compare both version of the code. If your code was successfully vectorized you should look at number of events FP 64-bit packed and FP 32-bit packed:
FP_COMP_OPS_EXE.SSE_SINGLE_PRECISION
FP_COMP_OPS_EXE.SSE_DOUBLE_PRECISION
Next step will be CPI calculation and comparison between two versions of the code.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page