According to the tutorials bandwidth analysis can be performed in 2 ways. 1.knc-cutom analysis (from core) 2. knc-bandwidth (just uncore)
http://www.youtube.com/watch?v=vnOqpyzui_s
I would like to do it the first way using the formula and I have certain doubts for the same.
given formula:
Read bandwidth (bytes/clock) (L2_DATA_READ_MISS_MEM_FILL + L2_DATA_WRITE_MISS_MEM_FILL + HWP_L2MISS) * 64 / CPU_CLK_UNHALTED Write bandwidth (bytes/clock) (L2_VICTIM_REQ_WITH_DATA + SNP_HITM_L2) * 64 / CPU_CLK_UNHALTEDI run my multi-threaded application from a script which does some environment setting before calling the application.
like
amplxe-cl -collect-with runsa-knc -knob event-config=CPU_CLK_UNHALTED:sa=10000, (other events with their sampling frequency) -- ssh mic0 "./script.sh"
Q1: Is this the statistics of the script or of all the process running while collecting statistics? How can I determine these statistics of the application which my script started?
I get the event summery like this
Event summary
-------------
Hardware Event Type Hardware Event Count:Self Hardware Event Sample Count:Self Events Per Sample
----------------------------- ------------------------- -------------------------------- -----------------
HWP_L2MISS 91000 13 1000
CPU_CLK_UNHALTED 49840000 712 10000
L2_DATA_READ_MISS_CACHE_FILL 336000 48 1000
L2_DATA_READ_MISS_MEM_FILL 714000 102 1000
L2_DATA_WRITE_MISS_CACHE_FILL 0 0 1000
L2_DATA_WRITE_MISS_MEM_FILL 0 0 1000
L2_VICTIM_REQ_WITH_DATA 0 0 1000
SNP_HITM_L2 0 0 1000
then I see my result using
amplxe-cl -report hw-events -format=csv -csv-delimiter=comma -report-output=output.csv -show-as=sample -r /home//bandwidth2/ -call-stack-mode=user-only -cumulative-threshold-percent=loop -group-by=process
Q2: Again is this giving the statistics of the script? As they are very different from the summary. How can i get the statistics of just the application am interested in which was spawned by the script?
Q3: As I have collected the samples with certain number of events (sa:1000) when i calculate the bandwidth I should multiply the Hardware Event Sample Count:L2_DATA_WRITE_MISS_MEM_FILL:Self with the "sa" value to get the correct bandwidth value?
Q4: While collecting there is a parameter "cpu-mask". If i set it to 0 does it mean it will monitor the hw-events only in core 0? if i set it to "all" then it monitors all 240 cores? If so, wont my statistics be wrong with the information form applications other than my multi-threaded application? I would like to know how to use this parameter.