I want to get the bandwidth reading on intel atom using intel vtune. When I was using xeon machine, I have used the bandwidth analysis of intel vtune by command
/opt/intel/vtune_amplifier_xe/bin64/amplxe-cl -collect snb-bandwidth -target-duration-type=long -- ./peclat
Now on the atom architecture, the analysis types available on intel vtune for CPU specific analysis are intel core 2, nehalem/westmere, sandy bridge, haswell and knights corner ifor xeon phi processor.
Can you please guide me how to use bandwidth for intel atom?
Latest product I mean VTune Amplifier XE 2013 Update 17 - removed bandwidth from CPU specific analysis, so snb-bandwidth doesn't exists again.
General-exploration & bandwidth analysis cover all Intel microarchitecture processors except Xeon Phi processors. You may use:
amplxe-cl -collect bandwidth -target-duration-type=long -- ./peclat
VTune Amplifier XE currently does not support bandwidth analysis for Atom processors on Windows and Linux. Stay tuned.
BTW - what platform are you working with? And is your interest memory bindwidth?
Thanks & Regards, Dmitry
Sorry for the late reply. I was running some results and I can not interrupt them. But now I have check that
amplxe-cl -collect bandwidth -target-duration-type=long -- ./peclat is providing the error too.
amplxe: Fatal error: This analysis type is not applicable to the current machine microarchitecture.
I am working on ubuntu 13.10 with intel vtune 2013. and I want to find the dram bandwidth for particular application.
This is a hardware dependent issue: bandwidth analysis is ONLY supported on 3rd generation of Core(TM) processors and later, they are, Sandbridge processors, Ivybridge processors and Haswell processors. Atom processor is NOT supported!!
Don't forget Xeon Phi!
VTune has a "knc-bandwidth" option that can read the memory controller performance counters there.
The VTune total traffic reported matches my expectations for a large version of the STREAM benchmark. Results are about 1.1% higher than expected for writes and 1.4% higher than expected for reads. These differences are in the right direction -- the actual execution of the code requires kernel code to instantiate the data pages (for which I don't know the exact number of additional memory references required) and the actual execution of the code requires some TLB miss traffic (this was run with a 6GiB footprint, so I was definitely beyond the bounds of TLB mapping).
@ John D.
Thanks for your inputs. Yes, current Update 17 already supports bandwidth analysis on Xeon Phi, simply use "amplxe-cl -collect knc-bandwidth ..." or use it on GUI.
You are right. I compared the cost of Linux kernel on Xeon Phi and on traditional Xeon, execution time on Xeon Phi is bigger (used same test case and built native code for Xeon Phi). I don't know the reason - maybe it requires more communication between many cores? Or small TLB missed or else?
You also will see that the cost of OpenMP library is bigger on Xeon Phi than traditional Xeon...
However performance result is encouraged from Xeon Phi, if you run HPC program; If your program is drowsed, don't use it Xeon Phi.