- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I am working on one of the research projects at university. I would like to seek help to quantify the thread life cycle.I am trying to run parsec on intel broadway numa machine and collecting stats.Could you please let me know what is the way to get the node that accesses the memory and node on which the memory is present.
Thank you in advance,
Deepthi
- Tags:
- Parallel Computing
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The details are going to depend on both the programming model and the hardware platform.
For OpenMP codes, I use environment variables to precisely control the mapping of threads to logical processors. Each thread can use the programmable core counter OFFCORE_RESPONSE events to measure load (or RFO) responses from local or remote memory. This event has bugs in some processors that cause undercounting, but I can't remember the specifics. I was definitely seeing undercounting on Xeon E5 v3 Haswell with some of these events just last week. This errata on Xeon E5 v3 (HSE108 in document 330785-010) does not appear in the Xeon E5 v4 (Broadwell) Specification Update document (333811-003), so it is probably worth testing these events on your system.
The bulk counts at the memory controller are accurate on all of the processors I have tested, but these cannot usually be tied back to specific cores. There are some exceptions -- for example the CBo (L3) Uncore events in the Xeon E5 v3 (Haswell) can be filtered by physical core, but this does not provide full generality -- it allows counting for 1 core or for the sum of all cores, but does not allow counting individually for each core in a single run. I have not tested this mechanism, so I don't know if it has idiosyncrasies....
The QPI/UPI data traffic counters have different bugs in each processor generation, but I have been able to find at least one set of events that gives the expected counts in each processor generation. Again, these cannot be tied back to individual cores.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Greetings for the day , McCalpin , John
Thank you so much for your swift response. I am running applications with both pthreads and using Openmp.
I shall try using OFFCORE_RESPONSE events
Thanks, Deepthi

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page