Software Tuning, Performance Optimization & Platform Monitoring
Discussion regarding monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform updating.

Intel PCM: FLT_SENT, NULL_IDLE Events?

tim_kiefer
Beginner
743 Views
Hi,
I am currently running experiments on a 4 socket (Westmere EX) HP maschine. There, I am especially interested in QPI Link usage and utilization, which I try to measure with the Performance Counter Monitor. After having studied the source code I am puzzled about 2 events in particular (what they count and how they are used).

I am trying to understand the events FLT_SENT (Flit Sent) and NULL_IDLE (Null Idle Flit Sent). The tool uses e.g. FLT_SENT events together with InvariantTSCs to compute the maximum QPI Link Speed (cpucounters.cpp - PCM::computeQPIspeed(int core_nr)). That makes me think, that one flit (8 Bytes?) can be sent per cycle (TSC) and that per cycle one FLT_SENT event occurs?!

In another method in the tool (cpucounters.h getOutgoingQPILinkUtilization), NULL_IDLE Fluts are counted and used together with UncoreTSCs to estimate the QPI Link Utilization. I am totally puzzled how that works I was also pretty confused when I counted FLT_SENT and NULL_IDLE and realized that there were more NULL_IDLE events than FLT_SENT events.

Can anybody shed light on the question what FLT_SENT and NULL_IDLE events actually count. At the same time, I would be interested in the difference between InvariantTSC and UncoreTSC - both used to compute different measures (see above).

Thanks a lot!
- tim

0 Kudos
1 Solution
Roman_D_Intel
Employee
743 Views
Hi Tim,

after asking around I got this explanation for the NULL_IDLE event behavior:

"Whenever QPI has no useful data to send, it sends out NULL packets. This event counts the number of clocks (in uncore clocks) when NULL packets are being sent. For a system running at 2.4GHz uncore freq with QPI 100% idle, you will see this event counting 2.4 billion in a second. For example, if the QPI utilization (useful data being sent) is about 20%, that means 80% of the time, null flits are being sent. So, this event will count 1.92 Billion per second (2.4*.8)"

Best regards,
Roman

View solution in original post

0 Kudos
5 Replies
Roman_D_Intel
Employee
743 Views

Hi Tim,

one flit is 80 bits (64 bits is data payload of a flit = 8 bytes). Each QPI link cyclea"phit" issent (20 bits = 16 data bits + 4 other system bits).That means per QPI link cycle (or per QPI transfer) two payload data bytes can be sent.You need four QPI link cycles to transfer one flit. Note that QPI link clock frequency is not equal to core TSC frequency (InvariantTSC).In PCM:computeQPISpeed, the InvariantTSCis usedjust as a timer function to pause the program for a certain amount of time and count the number of sent flits during this period(FLT_SENT).Please also see more details on QPI in this whitepaper.On Westmere/Nehalem-EX the NULL_IDLE event internally runs/increments with uncore clock frequency, therefore it is normalized with the UncoreTSC.

Regarding different TSC/frequencies:
InvariantTSC increments with the nominal core frequency
UncoreTSC is different from InvariantTSC: it runs at the frequency of the processor uncore
QPI link clockfrequency is different from InvariantTSC and UncoreTSC.In other terms it is theQPI GTransfers/second.

Best regards,
Roman

0 Kudos
tim_kiefer
Beginner
743 Views
Hi Roman,

thanks for your answer. I think I understand most of it now! I've read the whitepaper and know about phits and flits. I think I also understand the difference between the core/uncore frequencies. Just to comfirm: FLT_SENT is increased with the QPI link clock frequency - no matter, whether actual "data" is transmitted or not? Hence it can be used to compute the maximum QPI Link Speed?!

The one thing I still don't understand - the NULL_IDLE event. I can accept that it runs at uncore clock frequency and is therefore normalized with the uncoreTSC. But when is the counter incremented? I am also still trying to figure out, how it can be used to estimate the QPI Link Utilization...

I will appreciate any clarification. Thanks!

- tim

0 Kudos
Roman_D_Intel
Employee
743 Views
Tim,

you are correct with the first statement: FLT_SENT is running at "QPI link frequency/4" and does not halt no matter, whether actual "data" is transmitted or not.

I will try to get you a more precise answer for your second question soon.

--
Roman
0 Kudos
Roman_D_Intel
Employee
744 Views
Hi Tim,

after asking around I got this explanation for the NULL_IDLE event behavior:

"Whenever QPI has no useful data to send, it sends out NULL packets. This event counts the number of clocks (in uncore clocks) when NULL packets are being sent. For a system running at 2.4GHz uncore freq with QPI 100% idle, you will see this event counting 2.4 billion in a second. For example, if the QPI utilization (useful data being sent) is about 20%, that means 80% of the time, null flits are being sent. So, this event will count 1.92 Billion per second (2.4*.8)"

Best regards,
Roman
0 Kudos
tim_kiefer
Beginner
743 Views
Hi Roman,

that certainly clears things up! Thanks for your quick reply.

- tim
0 Kudos
Reply