- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have the following doubt,
When we collect hw-events for a multi-threaded program in xeon-phi, the statistics for every he-event is given on cumulative basis or thread basis? for example, CPU_CLK_UNHALTED like the cpu time (when using linux 'time') gives a cumulative clock cycles utilized by the application on defined number of cores. Is this correct?
How are cache_fill and other hw-events reported? Is it an accumulation of all core events or just one core specified in the -collect cpu-mask in general?
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
CPU clocks are shared among the HW threads of each core. If you look at the description of CPI per core versus CPI per thread in http://software.intel.com/en-us/articles/optimization-and-performance-tuning-for-intel-xeon-phi-coprocessors-part-2-understanding, there is an attempt to describe this feature.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page