VTune 9.1 runs slowly under Linux

cr_ea · ‎03-23-2009

I've used VTune reasonably extensively on Windows. Recently started using it on Linux.

The problem I'm having is that VTune itself is horrifically slow. Using a Pentium D 900, 2.7GHz with 4GB RAM, I can successfully run a sampling profile session -- but when I get to the point of analyzing the results, i.e. when I click into modules of interest and continue into my app (with maybe 80K clocktick samples as an example) it takes quite long to get to the final level of function call listings. "Quite long" = more than 20-30 minutes the first time I query results for each session. It does this whether I run locally or collect remotely from a Win box. Is that expected behavior? CPU usage for the VTune app/server stays at 100% that entire time. Just curious, have you guys VTune'd VTune itself? (Java is an "interesting" choice on your part to implement VTune, BTW).

Callgraph mode is a complete non-option. I expect it to be invasive for that, but the overhead for VTune in that collection mode is so huge I can't run my (client/server) application at a high enough frequency to be able to do even the most basic operations in my application. Even with VTPause and VTResume only around my most critical areas that I care about. That's a big disappointment.

New hardware is one possible route to a solution here, but that seems a bit questionable since I need to be able to profile my app on lower end hardware. And I'm not convinced yet that the investment would be worth the result.

Thoughts? Are there any newer versions of VTune coming that are more performant? In the mean time I'm having to use oprofile.

A few details on my config:
com.intel.vtune.analyzer (9.1.0)
os.version=2.6.18-92.1.22.el5
Centos 5.2

Thanks

TimP · ‎03-23-2009

You haven't said much about your choice of test sample run time and sample after values. For test runs of several minutes or more, you would increase the sample after values so as to control the number of samples and size of collection file.
Callgraph isn't intended (in my experience) to give accurate quantifiable data, even to the extent that gprof call graph would do. If you are successful in getting call graph with oprofile, congratulations.

David_A_Intel1 · ‎03-28-2009

The key is where you say, "80K clocktick samples". Please check out the Release Notes. There is a known issue when you have that many samples in a module. I suggestion you reduce the number of samples by either reducing the sampling rate or reducing the workload/runtime.

GaryBC · ‎04-03-2009

Quoting - MrAnderson (Intel)

The key is where you say, "80K clocktick samples". Please check out the Release Notes. There is a known issue when you have that many samples in a module. I suggestion you reduce the number of samples by either reducing the sampling rate or reducing the workload/runtime.

I've looked through the release notes for vtune Linux update 1 and don't see any mention about that. I tried my own experiment and if I use my own collector, import the samples using the TBRW framework I can scale very well. I just tried a sample session with 30 million samples in one module and it worked fine. The import took a few minutes but then navigation between source files seemed on par with the sessions where I have far fewer samples.