Intel® Moderncode for Parallel Architectures
Support for developing parallel programming applications on Intel® Architecture.

Thread scheduling on Numa machines



I am working on one of the research projects at university. I would like to seek help to quantify the thread life cycle.I am trying to run parsec on intel broadway numa machine and collecting stats.Could you please let me know what is the way to get the node that accesses the memory and node on which the memory is present.

Thank you in advance,



0 Kudos
2 Replies
Black Belt

The details are going to depend on both the programming model and the hardware platform.

For OpenMP codes, I use environment variables to precisely control the mapping of threads to logical processors.  Each thread can use the programmable core counter OFFCORE_RESPONSE events to measure load (or RFO) responses from local or remote memory.  This event has bugs in some processors that cause undercounting, but I can't remember the specifics.  I was definitely seeing undercounting on Xeon E5 v3 Haswell with some of these events just last week.  This errata on Xeon E5 v3 (HSE108 in document  330785-010) does not appear in the Xeon E5 v4 (Broadwell) Specification Update document (333811-003), so it is probably worth testing these events on your system.

The bulk counts at the memory controller are accurate on all of the processors I have tested, but these cannot usually be tied back to specific cores.   There are some exceptions -- for example the CBo (L3) Uncore events in the Xeon E5 v3 (Haswell) can be filtered by physical core, but this does not provide full generality -- it allows counting for 1 core or for the sum of all cores, but does not allow counting individually for each core in a single run.  I have not tested this mechanism, so I don't know if it has idiosyncrasies....

The QPI/UPI data traffic counters have different bugs in each processor generation, but I have been able to find at least one set of events that gives the expected counts in each processor generation.  Again, these cannot be tied back to individual cores.


Greetings for the day , McCalpin , John 

Thank you so much for your swift response. I am running applications with both pthreads and using Openmp.

I shall try using OFFCORE_RESPONSE events

Thanks,                                                                                                                                                                                                                       Deepthi