Analyzers
Talk to fellow users of Intel Analyzer tools (Intel VTune™ Profiler, Intel Advisor)
Announcements
FPGA community forums and blogs have moved to the Altera Community. Existing Intel Community members can sign in with their current credentials.
5261 Discussions

VTune AXE grid scrolling is very slow

antonpegushin
Beginner
1,387 Views
Hello,
general exploration for Nehalem grid results scrolled very slow (and I mean VERY), so I ran another AXE in parallel attaching to the first one and collecting GE on it as well. I'm seeing that 90+% of the time is spent executing XLATEOBJ_cGetPalette function (see screenshot). Is it a bug on my system or is it a performance optimization opportunity for VTune AXE :)?
0 Kudos
6 Replies
Peter_W_Intel
Employee
1,387 Views
Was it possible that you ran General Exploration analysiswith application for long time? Since result filesare bigger, opening / displaying report will be VERY SLOW. Please set reasonable "duration" (Automatically stop collection after (sec):"in Project Properties, for example 60s. Thus, result file will be displayed quickly. It's not harmful for data collection, if 60s can cover main functions in your application.

Hot function XLATEOBJ_cGetPalette() in win32k.sys, seems to receive color from the palette - usually it occurred when your application tried to paint (update) the graphic interface. Also you canuse Hotspots analysis, the resultincludes callers of XLATEOBJ_cGetPalette (callstack info) -verify if all calls are necessary or can be reduced.

Regards, Peter
0 Kudos
anton_pegushin1
Beginner
1,387 Views
Hello Peter,
the size of the resulting tb6 and the amount of data collected is irrelevant in this case. Yes, it takes VTune some time to open the result and show the grid, but it's not what the problem is. The problem is how long it takes VTune to scroll the grid left and right for me to inspect different columns values.
I looked more into this and I believe I found the root cause of the issue. Scrolling starts lagging when I switch my Viewpoint to "Hardware Event Counts" and unfold several columns to view per core data. It looks like a custom control (screenshot below), so I guess there's a bug in it having something to do with incorrent re-paint implementation that causes excessive calls to XLATEOBJ_cGetPalette().
0 Kudos
Kirill_R_Intel
Employee
1,387 Views
Hi Anton,

I tried General Exploration on Nehalem, but couldn't reproduce your issue. Does it occur on comparatively short runs? If so, can you provide us your results, so we'll try to reproduce it?

Also could you please provide some more details:
- OS version
- Do you see the issue when analysing a particular application? Can you reproduce it with Intel samples or system profiling?
- If it's application-specific, did you try to run analysis on other machines (not Nehalem?)-does itbehave the same way?

Regards,
Kirill
0 Kudos
anton_pegushin1
Beginner
1,387 Views
Hi Kirill,

ok, I checked it and the issue is 100% reproducible on both my desktop (64bit WS 2008) and laptop (64bit Windows 7). Instructions:

  1. go to VTune AXE samples directory, C++\matrix
  2. build a Release Win32 binary and run it. On my laptop the app creates 5 threads, utilizes 100% of the CPU (all 4 cores) and takes ~87 seconds to complete.
  3. start VTune AXE GUI, create a project that runs system-wide analysis for 60 seconds. In that project create a new General Exploration (for Nehalem) analysis.
  4. start the matrix multiplication (I assume it is) app and then start the VTune analysis from (3).
  5. when VTune finalizes the results, switch to 'Hardware Event Counters' viewpoint and when the data loads go to 'Bottom-up'
  6. In Bottom-up grid unfold at least 10 high-level columns all the way down to per-core data (the way it's shown on the screenshot in my previous comment). Choose neighbor columns, because this is when scrolling lags the most - when I try to slowly scroll an area showing unfolded columns left or right.
  7. Now scroll slowly to the right using the handle of the horizontal scroll bar (slowly as if you're actually looking at the displayed data). BTW, while scrolling with the handle lags, scrolling with left-right buttons on the scroll bar is completely unusable.
I attached to the VTune AXE process that was slowly scrolling my data left and right and ran Hotspots analysis for 20 seconds. The hotspot is on the screenshot below.
Interestingly, the rendering thread consumes 90-100% of the CPU, when I scroll the grid. In comparison, if you export all of the data from the grid into a CSV file, open it in Microsoft Excel and scroll the view there, not only it does not lag, but the rendering thread peaks at 15% for a moment when you start scrolling, and then the utilization drops to 6-8% no matter how long you move the grid around. I know those are two completely different pieces of software, Excel and VTune, but data viewer is still just a data viewer - it's columns, rows and cells with numbers in it, and from what I can deduce total amount of data displayed fits a 5MB buffer easily.
0 Kudos
Kirill_R_Intel
Employee
1,387 Views
Thanks for the details,

I've followed just the same steps and still can't reproduce the issue, scrolling works fine, both vertical and horizontal. I've run Nehalem General Exploration for the whole system for 60 seconds, and Matrix sample was running in the same time also. I switched to "Hardware Event count" viewpoint and expanded the columns for per-core info.

Probably there is some environment-specific issue. Can you gather your system info, I'll compare with my setup:
$ amplxe-feedback.exe --create-bug-report=report.txt

Also, just to ensure - are you using Amplifier XE update 5 or another version?

Regards,
Kirill
0 Kudos
anton_pegushin1
Beginner
1,387 Views
Hello Kirill,
thank you for looking into this. Yes, I'm running Update 5 on all my machines, but I think we've seen similar behavior with previous updates as well.
It's very unlikely that this is a problem with my HW/OS setup, because I can reproduce this on two completely different machines:
  1. my laptop in a production network (anti-virus running, etc.), which is a Core i5 with 4GB of RAM and Windows 7 64-bit
  2. my desktop, which is in a lab network (no anti-virus, clean system), it's a Intel Xeon 5680, dual-socket, 96 GB of RAM and it runs Windows Server 2008 64-bit.
I will send you a report.txt file from my laptop as an attachment to the next reply, which I'll make private.
It's a pity you can't reproduce the issue, sometimes the scrolling lags make result viewing very uncomfortable to the point that I export data to Microsoft Excel and view them there. Could the screenshot I attached to my previous post help at all? Maybe it would give the GUI developer an idea why so much of CPU is being utilized by the GUI?
Either way, thanks for your support!
0 Kudos
Reply