Analyzers
Support for Analyzers (Intel VTune™ Profiler, Intel Advisor, Intel Inspector)
Announcements
This community is designed for sharing of public information. Please do not share Intel or third-party confidential information here.
4679 Discussions

Help with undestanding the Vtune results.

Dharma
Beginner
126 Views
Hello,
I ran a lightwieight hotspot analysis on my code. I get the result attached as csv file. Can you please help me with pointers to what i can do now to improve the speed of the program. Major portion of the time is spent in zgemm3m for amtrix multiplications and matrix inverse using zgesv (or getrf and getri ). I am not able to understand the timing information obtained.

My computer has dual quad core(E5240) 2.493 GHz
0 Kudos
3 Replies
Peter_W_Intel
Employee
126 Views
Thanks for your results of lightweight-hotspots. Usually you can identify performance issue based onCPI value on tophot functions, the smaller the better. Howeversome function which used SSE3/SSE4//AVX instructions, willhas big CPI value - itis reasonable(single instruction, multiple data)

So you may investigate source line - which caused highCPI value (small instruction retired, big CPU cycles spent). For MKL functions, they are well performance tuned functions...You only need toensure if you used them in right usage mode.

You mayuse Concurrency Analysis to know parallelsimof your program, work balance onthreads, cores' utilization, etc.

You may use LocksAndWaits Analysis to know wait time, which may cause stalls between threads.

Regards, Peter
Peter_W_Intel
Employee
126 Views
"..I am not able to understand the timing information obtained." - Thetime is shown onreport, was calculated by using this formula:
"CPU Unhalted Cycles" Event / CPU Frequency

Overhead of profiling timewas not considered, I guess.

Dharma
Beginner
126 Views
Thanks Peter,
I think i need to take a relook at the algorithm i am using. As you sugested, i will do the other analysis and see if there are any issues that may be bottlenecks.

thanks
Reddy
Reply