Analyzers
Talk to fellow users of Intel Analyzer tools (Intel VTune™ Profiler, Intel Advisor)
4973 Discussions

Export memory bandwidth data?

Tristan_J_1
Beginner
641 Views

I'm using Intel VTune Amplifier XE 2013 to gather memory bandwidth usage for a particular application and was wondering if there's some way to export this data for additional analysis with other tools.  In particular, I'd love to be able to export a particular capture to an Excel file so that I can calculate things like average and standard deviation on the results over time. Right now only seem to be able to browse the data in the VTune application and drill down on individual points.

I haven't been able to find any thing in the UI that might be able to enable this.  Maybe there is a command line option somewhere?

Many thanks!

0 Kudos
11 Replies
Peter_W_Intel
Employee
641 Views

Please try in this way, command line:

#amplxe-cl -collect snb-bandwidth -- ./program

# amplxe-cl -report hw-events -format csv -csv-delimiter=","

Note: 1. You can redirect the ouputs to an excel file. 2. All are summary data of hardware event (LLC Miss on local or remote, caused a memory access), no time stamp info.

0 Kudos
Tristan_J_1
Beginner
641 Views

Thanks for the reply Peter.

I tried that, but that command only appears to output the aggregate data from the run that appears at the bottom of a memory bandwidth analysis.  The data that I'm most interested in exporting is the observed memory bandwidth over time...  which is the part that is graphed at the tob of a memory bandwidth analysis.

I've attached an image from the VTune Amplifier UI that highlights the data that I'm looking to export.

Thanks!

0 Kudos
Peter_W_Intel
Employee
641 Views

Tristan J. wrote:

Thanks for the reply Peter.

I tried that, but that command only appears to output the aggregate data from the run that appears at the bottom of a memory bandwidth analysis.  The data that I'm most interested in exporting is the observed memory bandwidth over time...  which is the part that is graphed at the tob of a memory bandwidth analysis.

I've attached an image from the VTune Amplifier UI that highlights the data that I'm looking to export.

Thanks!

I know that you need to see overtime data, but it is not feasible for exporting. As I said last post, only summar data can be exported - you may find hot functions and know how freqent they have local/remote DRAM access.

0 Kudos
Tristan_J_1
Beginner
641 Views

Thanks.  I would definitely vote for allowing the bandwidth data over time be exportable in a future update.  I would definitely find value in using that data for more in-depth analysis.

0 Kudos
Peter_W_Intel
Employee
641 Views

Tristan J. wrote:

Thanks.  I would definitely vote for allowing the bandwidth data over time be exportable in a future update.  I would definitely find value in using that data for more in-depth analysis.

I have escalated this new feature request to dev team. I will update this thread if the feature is ready.

0 Kudos
Surya_Narayanan_N_
641 Views

is this feature ready yet?

0 Kudos
Peter_W_Intel
Employee
641 Views

Surya Narayanan N. wrote:

is this feature ready yet?

It seems not ready yet for timeline report, in command line.

0 Kudos
Surya_Narayanan_N_
641 Views

Ok, I would like to know the bandwidth computation using knc-bandwidth, the summary looks like this

CPU
---
Parameter          bw_org_2
-----------------  -----------------------------
Frequency          1052000000
Logical CPU Count  240
Name               Intel(R) Xeon(R) E5 processor

Summary
-------
Elapsed Time:  2.984
CPU Usage:     2.893

Event summary
-------------
Hardware Event Type  Hardware Event Count:Self  Hardware Event Sample Count:Self  Events Per Sample
-------------------  -------------------------  --------------------------------  -----------------
CPU_CLK_UNHALTED     9140000000                 914                               10000000

Uncore Event summary
--------------------
Hardware Event Type            Hardware Event Count:Self
-----------------------------  -------------------------
UNC_F_CH0_NORMAL_WRITE[UNIT0]  10103918
UNC_F_CH0_NORMAL_WRITE[UNIT1]  10109727
UNC_F_CH0_NORMAL_WRITE[UNIT2]  10095707
UNC_F_CH0_NORMAL_WRITE[UNIT3]  10102520
UNC_F_CH0_NORMAL_WRITE[UNIT4]  10095936
UNC_F_CH0_NORMAL_WRITE[UNIT5]  10100786
UNC_F_CH0_NORMAL_WRITE[UNIT6]  10109940
UNC_F_CH0_NORMAL_WRITE[UNIT7]  10100599
UNC_F_CH0_NORMAL_READ[UNIT0]   8574334
UNC_F_CH0_NORMAL_READ[UNIT1]   8588694
UNC_F_CH0_NORMAL_READ[UNIT2]   8562949
UNC_F_CH0_NORMAL_READ[UNIT3]   8611755
UNC_F_CH0_NORMAL_READ[UNIT4]   8566964
UNC_F_CH0_NORMAL_READ[UNIT5]   8573352
UNC_F_CH0_NORMAL_READ[UNIT6]   8590854
UNC_F_CH0_NORMAL_READ[UNIT7]   8589209
UNC_F_CH1_NORMAL_WRITE[UNIT0]  10100922
UNC_F_CH1_NORMAL_WRITE[UNIT1]  10101943
UNC_F_CH1_NORMAL_WRITE[UNIT2]  10105199
UNC_F_CH1_NORMAL_WRITE[UNIT3]  10100272
UNC_F_CH1_NORMAL_WRITE[UNIT4]  10106579
UNC_F_CH1_NORMAL_WRITE[UNIT5]  10123764
UNC_F_CH1_NORMAL_WRITE[UNIT6]  10115382
UNC_F_CH1_NORMAL_WRITE[UNIT7]  10100624
UNC_F_CH1_NORMAL_READ[UNIT0]   8576649
UNC_F_CH1_NORMAL_READ[UNIT1]   8566361
UNC_F_CH1_NORMAL_READ[UNIT2]   8592849
UNC_F_CH1_NORMAL_READ[UNIT3]   8591451
UNC_F_CH1_NORMAL_READ[UNIT4]   8577494
UNC_F_CH1_NORMAL_READ[UNIT5]   8624924
UNC_F_CH1_NORMAL_READ[UNIT6]   8615670
UNC_F_CH1_NORMAL_READ[UNIT7]   8585177
amplxe: Executing actions 100 % done  

But when i load the result file in GUI I see 

Average Bandwidth  
Package Bandwidth, GB/sec
package_0 6.414

 

How is this 6.414GB/Sec computed?

0 Kudos
Peter_W_Intel
Employee
641 Views

This is an internal analysis type, and there is internal formula to compute Avg. Bandwidth by using counters, you can reference in file vtune_amplifier_xe_2013/config/query_library/uncore_metrics.cfg to know more.

Note that you should not modify this file, otherwise it will cause unexpected result.

0 Kudos
Surya_Narayanan_N_
641 Views

Thank you, I also have a question on how the core bandwidth and uncore bandwidth numbers are validated? I tried to run just the copy part of the STREAM benchmark. Usually in xeon-phi it reaches a maximum of 140 GB/Sec for 240 threads. I achieve similar results when using core calculation but the uncore bandwidth shows something more than 250GB/Sec for 128 threads, which is not matching with the calim that both the way of bandwidth measurement gives similar numbers. 

0 Kudos
Peter_W_Intel
Employee
641 Views

What I posted previous .cfg file has contents for different processor type, e.g. snb/ivybridge, haswell, snbep/ivytown, core7b, knc, etc.

knc - DataWrittenGB,

  <valueEval><![CDATA[ ( ( query("/UncoreEventCount/UncoreEventType[UNC_F_CH0_NORMAL_WRITE[UNIT0]]") + query("/UncoreEventCount/UncoreEventType[UNC_F_CH1_NORMAL_WRITE[UNIT0]]") + query("/UncoreEventCount/UncoreEventType[UNC_F_CH0_NORMAL_WRITE[UNIT1]]") + query("/UncoreEventCount/UncoreEventType[UNC_F_CH1_NORMAL_WRITE[UNIT1]]") + query("/UncoreEventCount/UncoreEventType[UNC_F_CH0_NORMAL_WRITE[UNIT2]]") + query("/UncoreEventCount/UncoreEventType[UNC_F_CH1_NORMAL_WRITE[UNIT2]]") + query("/UncoreEventCount/UncoreEventType[UNC_F_CH0_NORMAL_WRITE[UNIT3]]") + query("/UncoreEventCount/UncoreEventType[UNC_F_CH1_NORMAL_WRITE[UNIT3]]") + query("/UncoreEventCount/UncoreEventType[UNC_F_CH0_NORMAL_WRITE[UNIT4]]") + query("/UncoreEventCount/UncoreEventType[UNC_F_CH1_NORMAL_WRITE[UNIT4]]") + query("/UncoreEventCount/UncoreEventType[UNC_F_CH0_NORMAL_WRITE[UNIT5]]") + query("/UncoreEventCount/UncoreEventType[UNC_F_CH1_NORMAL_WRITE[UNIT5]]") + query("/UncoreEventCount/UncoreEventType[UNC_F_CH0_NORMAL_WRITE[UNIT6]]") + query("/UncoreEventCount/UncoreEventType[UNC_F_CH1_NORMAL_WRITE[UNIT6]]") + query("/UncoreEventCount/UncoreEventType[UNC_F_CH0_NORMAL_WRITE[UNIT7]]") + query("/UncoreEventCount/UncoreEventType[UNC_F_CH1_NORMAL_WRITE[UNIT7]]") ) * 64 ) / 1000000000 ]]></valueEval>

knc - DataReadGB,

  <derivedQuery idToOverwrite="DataReadGB">
          <valueEval><![CDATA[ ( ( query("/UncoreEventCount/UncoreEventType[UNC_F_CH0_NORMAL_READ[UNIT0]]") + query("/UncoreEventCount/UncoreEventType[UNC_F_CH1_NORMAL_READ[UNIT0]]") + query("/UncoreEventCount/UncoreEventType[UNC_F_CH0_NORMAL_READ[UNIT1]]") + query("/UncoreEventCount/UncoreEventType[UNC_F_CH1_NORMAL_READ[UNIT1]]") + query("/UncoreEventCount/UncoreEventType[UNC_F_CH0_NORMAL_READ[UNIT2]]") + query("/UncoreEventCount/UncoreEventType[UNC_F_CH1_NORMAL_READ[UNIT2]]") + query("/UncoreEventCount/UncoreEventType[UNC_F_CH0_NORMAL_READ[UNIT3]]") + query("/UncoreEventCount/UncoreEventType[UNC_F_CH1_NORMAL_READ[UNIT3]]") + query("/UncoreEventCount/UncoreEventType[UNC_F_CH0_NORMAL_READ[UNIT4]]") + query("/UncoreEventCount/UncoreEventType[UNC_F_CH1_NORMAL_READ[UNIT4]]") + query("/UncoreEventCount/UncoreEventType[UNC_F_CH0_NORMAL_READ[UNIT5]]") + query("/UncoreEventCount/UncoreEventType[UNC_F_CH1_NORMAL_READ[UNIT5]]") + query("/UncoreEventCount/UncoreEventType[UNC_F_CH0_NORMAL_READ[UNIT6]]") + query("/UncoreEventCount/UncoreEventType[UNC_F_CH1_NORMAL_READ[UNIT6]]") + query("/UncoreEventCount/UncoreEventType[UNC_F_CH0_NORMAL_READ[UNIT7]]") + query("/UncoreEventCount/UncoreEventType[UNC_F_CH1_NORMAL_READ[UNIT7]]") ) * 64 ) / 1000000000 ]]></valueEval>

knc - DataTransferGB,

<derivedQuery idToOverwrite="DataTransferredGB">
          <valueEval><![CDATA[ ( ( query("/UncoreEventCount/UncoreEventType[UNC_F_CH0_NORMAL_READ[UNIT0]]") + query("/UncoreEventCount/UncoreEventType[UNC_F_CH1_NORMAL_READ[UNIT0]]") + query("/UncoreEventCount/UncoreEventType[UNC_F_CH0_NORMAL_READ[UNIT1]]") + query("/UncoreEventCount/UncoreEventType[UNC_F_CH1_NORMAL_READ[UNIT1]]") + query("/UncoreEventCount/UncoreEventType[UNC_F_CH0_NORMAL_READ[UNIT2]]") + query("/UncoreEventCount/UncoreEventType[UNC_F_CH1_NORMAL_READ[UNIT2]]") + query("/UncoreEventCount/UncoreEventType[UNC_F_CH0_NORMAL_READ[UNIT3]]") + query("/UncoreEventCount/UncoreEventType[UNC_F_CH1_NORMAL_READ[UNIT3]]") + query("/UncoreEventCount/UncoreEventType[UNC_F_CH0_NORMAL_READ[UNIT4]]") + query("/UncoreEventCount/UncoreEventType[UNC_F_CH1_NORMAL_READ[UNIT4]]") + query("/UncoreEventCount/UncoreEventType[UNC_F_CH0_NORMAL_READ[UNIT5]]") + query("/UncoreEventCount/UncoreEventType[UNC_F_CH1_NORMAL_READ[UNIT5]]") + query("/UncoreEventCount/UncoreEventType[UNC_F_CH0_NORMAL_READ[UNIT6]]") + query("/UncoreEventCount/UncoreEventType[UNC_F_CH1_NORMAL_READ[UNIT6]]") + query("/UncoreEventCount/UncoreEventType[UNC_F_CH0_NORMAL_READ[UNIT7]]") + query("/UncoreEventCount/UncoreEventType[UNC_F_CH1_NORMAL_READ[UNIT7]]") + query("/UncoreEventCount/UncoreEventType[UNC_F_CH0_NORMAL_WRITE[UNIT0]]") + query("/UncoreEventCount/UncoreEventType[UNC_F_CH1_NORMAL_WRITE[UNIT0]]") + query("/UncoreEventCount/UncoreEventType[UNC_F_CH0_NORMAL_WRITE[UNIT1]]") + query("/UncoreEventCount/UncoreEventType[UNC_F_CH1_NORMAL_WRITE[UNIT1]]") + query("/UncoreEventCount/UncoreEventType[UNC_F_CH0_NORMAL_WRITE[UNIT2]]") + query("/UncoreEventCount/UncoreEventType[UNC_F_CH1_NORMAL_WRITE[UNIT2]]") + query("/UncoreEventCount/UncoreEventType[UNC_F_CH0_NORMAL_WRITE[UNIT3]]") + query("/UncoreEventCount/UncoreEventType[UNC_F_CH1_NORMAL_WRITE[UNIT3]]") + query("/UncoreEventCount/UncoreEventType[UNC_F_CH0_NORMAL_WRITE[UNIT4]]") + query("/UncoreEventCount/UncoreEventType[UNC_F_CH1_NORMAL_WRITE[UNIT4]]") + query("/UncoreEventCount/UncoreEventType[UNC_F_CH0_NORMAL_WRITE[UNIT5]]") + query("/UncoreEventCount/UncoreEventType[UNC_F_CH1_NORMAL_WRITE[UNIT5]]") + query("/UncoreEventCount/UncoreEventType[UNC_F_CH0_NORMAL_WRITE[UNIT6]]") + query("/UncoreEventCount/UncoreEventType[UNC_F_CH1_NORMAL_WRITE[UNIT6]]") + query("/UncoreEventCount/UncoreEventType[UNC_F_CH0_NORMAL_WRITE[UNIT7]]") + query("/UncoreEventCount/UncoreEventType[UNC_F_CH1_NORMAL_WRITE[UNIT7]]") ) * 64 ) / 1000000000 ]]></valueEval>

If you have concern about VTune result, please send a ticket to Intel Premier Support, with your result directory - for investigating. 

 

 

0 Kudos
Reply