general exploration for Nehalem grid results scrolled very slow (and I mean VERY), so I ran another AXE in parallel attaching to the first one and collecting GE on it as well. I'm seeing that 90+% of the time is spent executing XLATEOBJ_cGetPalette function (see screenshot). Is it a bug on my system or is it a performance optimization opportunity for VTune AXE :)?
Was it possible that you ran General Exploration analysiswith application for long time? Since result filesare bigger, opening / displaying report will be VERY SLOW. Please set reasonable "duration" (Automatically stop collection after (sec):"in Project Properties, for example 60s. Thus, result file will be displayed quickly. It's not harmful for data collection, if 60s can cover main functions in your application.
Hot function XLATEOBJ_cGetPalette() in win32k.sys, seems to receive color from the palette - usually it occurred when your application tried to paint (update) the graphic interface. Also you canuse Hotspots analysis, the resultincludes callers of XLATEOBJ_cGetPalette (callstack info) -verify if all calls are necessary or can be reduced.
the size of the resulting tb6 and the amount of data collected is irrelevant in this case. Yes, it takes VTune some time to open the result and show the grid, but it's not what the problem is. The problem is how long it takes VTune to scroll the grid left and right for me to inspect different columns values.
I looked more into this and I believe I found the root cause of the issue. Scrolling starts lagging when I switch my Viewpoint to "Hardware Event Counts" and unfold several columns to view per core data. It looks like a custom control (screenshot below), so I guess there's a bug in it having something to do with incorrent re-paint implementation that causes excessive calls to XLATEOBJ_cGetPalette().
I tried General Exploration on Nehalem, but couldn't reproduce your issue. Does it occur on comparatively short runs? If so, can you provide us your results, so we'll try to reproduce it?
Also could you please provide some more details: - OS version - Do you see the issue when analysing a particular application? Can you reproduce it with Intel samples or system profiling? - If it's application-specific, did you try to run analysis on other machines (not Nehalem?)-does itbehave the same way?
ok, I checked it and
the issue is 100% reproducible on both my desktop (64bit WS 2008) and laptop
(64bit Windows 7). Instructions:
VTune AXE samples directory, C++\matrix
Release Win32 binary and run it. On my laptop the app creates 5 threads,
utilizes 100% of the CPU (all 4 cores) and takes ~87 seconds to
VTune AXE GUI, create a project that runs system-wide analysis for 60 seconds.
In that project create a new General Exploration (for Nehalem)
matrix multiplication (I assume it is) app and then start the VTune analysis
VTune finalizes the results, switch to 'Hardware Event Counters' viewpoint and
when the data loads go to 'Bottom-up'
Bottom-up grid unfold at least 10 high-level columns all the way down to
per-core data (the way it's shown on the screenshot in my previous comment).
Choose neighbor columns, because this is when scrolling lags the most - when I
try to slowly scroll an area showing unfolded columns left or right.
scroll slowly to the right using the handle of the horizontal scroll bar (slowly as if you're actually looking at the displayed data). BTW, while scrolling with the handle lags, scrolling with left-right buttons on the scroll bar is completely unusable.
I attached to the VTune AXE process that was slowly
scrolling my data left and right and ran Hotspots analysis for 20 seconds. The
hotspot is on the screenshot below.
Interestingly, the rendering thread consumes
90-100% of the CPU, when I scroll the grid. In comparison, if you export all of
the data from the grid into a CSV file, open it in Microsoft Excel and scroll the
view there, not only it does not lag, but the rendering thread peaks at 15% for
a moment when you start scrolling, and then the utilization drops to 6-8% no
matter how long you move the grid around. I know those are two completely different pieces of software, Excel and VTune, but data viewer is still just a data viewer - it's columns, rows and cells with numbers in it, and from what I can deduce total amount of data displayed fits a 5MB buffer easily.
I've followed just the same steps and still can't reproduce the issue, scrolling works fine, both vertical and horizontal. I've run Nehalem General Exploration for the whole system for 60 seconds, and Matrix sample was running in the same time also. I switched to "Hardware Event count" viewpoint and expanded the columns for per-core info.
Probably there is some environment-specific issue. Can you gather your system info, I'll compare with my setup: $ amplxe-feedback.exe --create-bug-report=report.txt
Also, just to ensure - are you using Amplifier XE update 5 or another version?
thank you for looking into this. Yes, I'm running Update 5 on all my machines, but I think we've seen similar behavior with previous updates as well.
It's very unlikely that this is a problem with my HW/OS setup, because I can reproduce this on two completely different machines:
my laptop in a production network (anti-virus running, etc.), which is a Core i5 with 4GB of RAM and Windows 7 64-bit
my desktop, which is in a lab network (no anti-virus, clean system), it's a Intel Xeon 5680, dual-socket, 96 GB of RAM and it runs Windows Server 2008 64-bit.
I will send you a report.txt file from my laptop as an attachment to the next reply, which I'll make private.
It's a pity you can't reproduce the issue, sometimes the scrolling lags make result viewing very uncomfortable to the point that I export data to Microsoft Excel and view them there. Could the screenshot I attached to my previous post help at all? Maybe it would give the GUI developer an idea why so much of CPU is being utilized by the GUI?