Software Tuning, Performance Optimization & Platform Monitoring
Discussion regarding monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform updating.

Baytrail SoC Uncore Event Groups

Simon_O_
Beginner
1,020 Views

I'm trying to monitor the SoC uncore event groups detailed in:
  
https://software.intel.com/en-us/articles/baytrail-uncore-performance-monitoring-events

VTune Amplifier XE 2015 update 1 doesn't seem to list those event groups or the individual events shown within them. The VTune events do match those detailed in secton 18.6.2 of the Intel 64 and IA-32 Architectures Software Developer’s Manual (volume 3b, part 2), but they don't provide the same monitoring capabilities as the SoC groups.

How can I monitor the Baytrail SoC group events as shown in the link above?

I'm also using the Intel Performance Counter Monitor (pcm-* tools) on an embedded system running Yocto Linux. However, even the latest v2.8 only supports monitoring uncore events on Jaketown/Ivytown/Haswell processors.

Is there an update in the pipeline, or are the additional uncore MSR details available somewhere so that I may add them to my version? Or is there an alternative tool that I should be using instead?

Thanks,

Simon

0 Kudos
1 Solution
A_T_Intel
Employee
1,020 Views

VTune Amplifier 2015 Update 2 should show you a tab for SoC Bandwidth that counts requests from all agents (processors, graphics, IO).  This version also includes SEP/EMON and this is the tool you probably want to use for counting uncore events on Baytrail.  I would recommend trying EMON with the -? option to list all available events.  It should list several "UNC_SOC" events.  If you see the event you want to sample then you can count it with "EMON -t 1 -C EVENT_NAME".

View solution in original post

0 Kudos
11 Replies
A_T_Intel
Employee
1,021 Views

VTune Amplifier 2015 Update 2 should show you a tab for SoC Bandwidth that counts requests from all agents (processors, graphics, IO).  This version also includes SEP/EMON and this is the tool you probably want to use for counting uncore events on Baytrail.  I would recommend trying EMON with the -? option to list all available events.  It should list several "UNC_SOC" events.  If you see the event you want to sample then you can count it with "EMON -t 1 -C EVENT_NAME".

0 Kudos
Simon_O_
Beginner
1,020 Views

Thanks for the details, and the new VTune update.  I'll give EMON a try first to confirm it's showing the expected details.

I'm not sure if I'll need it yet, but is System Studio due a matching update 2 with the additional events?

0 Kudos
Simon_O_
Beginner
1,020 Views

Most of the module-based SOC event groups are working well with EMON, but the memory counters are always zero.  The following groups are not working at all for me:

   UNC_VISA_DDR_Self_Refresh
   UNC_VISA_Memory_DDR0_BW
   UNC_VISA_Memory_DDR1_BW
   UNC_VISA_Memory_DDR_BW

The Bandwidth analysis test within VTune (update 2 added support for Silvermont) is also showing zero bytes/s for all tests.

I noticed there is a UNC_VISA_LowSpeedPF_BW group for low-speed fabric, but shouldn't there also be a UNC_VISA_HighSpeedPF_BW group for high-speed (PCI Express)?

Finally, are the register values for the SOC event group counters available, so I can monitor them from my own code?

0 Kudos
A_T_Intel
Employee
1,019 Views

Good! I'm glad to hear you were able to get some counts from some of the events with EMON. 

Regarding memory event counts being zero; this means that the performance counters are powered off by your BIOS/Firmware settings.  There is likely no way to change this, but there might be a BIOS menu option to set debug option to "PerfMode".  You can get a very good idea of total memory bandwidth with event UNC_VISA_All_Reqs.  This event will give the total request count from each agent (CPUs, IO, GFX).  Sum them up, multiply by 64 bytes and divide by seconds sampled to get bytes per second. 

Regarding LowSpeed vs HighSpeed; Baytrail only has a low speed peripheral fabric and no high speed fabric. All IO traffic goes through the low speed fabric including USB3, PCIE, SATA, GbE and so on.  Only the micro-server product Rangeley contains the PCIE connected via high speed fabric. 

 

 

0 Kudos
A_T_Intel
Employee
1,020 Views

If your BIOS does expose the PerfMode option, then it will likely be under a "Debug Configuration" menu and the specific option named  "PDM/DFX Setting".

 

0 Kudos
Simon_O_
Beginner
1,020 Views

Thanks for the updates, and confirmation that PCIe is part of the low-speed fabric on this BayTrail-I.  I'd seen the Rangely diagrams, and assumed that it would be the same if it was present.

I have found UNC_VISA_All_Reqs to be useful for CPU-related measurements, though the Disp_Reqs / Imaging_Reqs / VED_Reqs sub-event counters within it are still always zero for me.  I certainly expected to see something in VED_Reqs when I was actively decoding video through VAAPI.  The other sub-event counters (Mod0_Reqs Mod1_Reqs / GFX_Reqs / LowSpeedPF_Reqs) are working normally.

I do already have the "PDM/Dfx" BIOS setting set to "Perf", as without it I couldn't access any MSR values.  Do you know if there are different levels to the perf mode that might need to be enabled, or should it be all or nothing?  I can chase up the BIOS vendor (Insyde) if it's possible that additional bits need to be set somewhere.

 

0 Kudos
A_T_Intel
Employee
1,020 Views

There are different levels to the perf mode in regards to the uncore events, but the "perf mode" is the correct setting to enable both the system agent events and the memory controller events.  Are you still seeing zero bandwidth on DDR?  If so, you could try PDM/DFx set to "on".

For UNC_VISA_All_Reqs I would expect you to see counts for Disp_Reqs if your monitor/screen is turned on and it should increase if you attach additional monitors.  Imaging_Reqs should be zero unless you have the camera active.  VED can be hard to enable and requires a driver.  If the correct driver is not enabled then the encode/decode will be done by the CPUs. 

 

 

0 Kudos
Richard_S_1
Beginner
1,020 Views

Thanks for the update, Simon is on holiday this week.

We have tried the different permutations of PDM/DFx in the BIOS; "PDM On", "Perf Mode" and "Debug Reserved" and get the same results in all cases. Using EMON, the majority of the counters are working but none of the UNC_VISA_DDR* or UNC_VISA_MEMORY_DDR* are working.

VED is also not working. We know that we are using the video decode engine to decode, using open-source Linux graphics drivers and VAAPI. Interestingly the GFX_Reqs / GFX_Read64B / GFX_Write64B counters do show activity when we are decoding, even if we are not displaying the decoded images or using the GPU to transfer them in any way.

Fundamentally, we would like to know how the Performance Counter are enabled and what we can read to verify that they have been enabled correctly, then we can present our findings to our BIOS supplier and get the BIOS changed if required. We have an RSNDA in place.

On a related note, we are interested in performance counters related to the PCIe interface, do such counters exist?

Is there a document that lists the Uncore counters in more detail?

0 Kudos
A_T_Intel
Employee
1,020 Views

Hi Richard,

For any NDA and BIOS setting specific conversations we would need to move to a private email or phone conversation. 

Regarding PCIe interface bandwidth, there are no publically available counters for specific blocks in the south cluster.  The closest metric is the aggregate bandwidth of all south cluster traffic. 

On the VED topic, it sounds like the driver is offloading the encode/decode to the GFX unit rather than the VED unit.  My understanding is that the Baytrail VED unit only supports VP8.  Is that the format you are testing?

 

0 Kudos
Richard_S_1
Beginner
1,020 Views

Hello Perry,

Thanks for getting back to us so promptly.

Our FAE has opened up a Premier Support issue (ID 6000090586) now that we have an RSNDA. Is it possible for you to reply via that or do we need to open up another one?

We understand about the VED now. We were looking for some metrics that differentiated between the GPU "proper" and the MFX (H.264 video decoder). We thought that the VED was another acronym for that, rather than just the VP8-specific decode unit. We can see that there are registers internal to the MFX that give some performance metrics but it would be great to get a list of all of the UNCORE Performance Counters.

   
0 Kudos
A_T_Intel
Employee
1,020 Views

Ok, I am not in the team that handles the Premier issues but I'll contact that team to help support the request.

Thanks,

Perry

0 Kudos
Reply