- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I ran a lightwieight hotspot analysis on my code. I get the result attached as csv file. Can you please help me with pointers to what i can do now to improve the speed of the program. Major portion of the time is spent in zgemm3m for amtrix multiplications and matrix inverse using zgesv (or getrf and getri ). I am not able to understand the timing information obtained.
My computer has dual quad core(E5240) 2.493 GHz
I ran a lightwieight hotspot analysis on my code. I get the result attached as csv file. Can you please help me with pointers to what i can do now to improve the speed of the program. Major portion of the time is spent in zgemm3m for amtrix multiplications and matrix inverse using zgesv (or getrf and getri ). I am not able to understand the timing information obtained.
My computer has dual quad core(E5240) 2.493 GHz
Link Copied
3 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for your results of lightweight-hotspots. Usually you can identify performance issue based onCPI value on tophot functions, the smaller the better. Howeversome function which used SSE3/SSE4//AVX instructions, willhas big CPI value - itis reasonable(single instruction, multiple data)
So you may investigate source line - which caused highCPI value (small instruction retired, big CPU cycles spent). For MKL functions, they are well performance tuned functions...You only need toensure if you used them in right usage mode.
You mayuse Concurrency Analysis to know parallelsimof your program, work balance onthreads, cores' utilization, etc.
You may use LocksAndWaits Analysis to know wait time, which may cause stalls between threads.
Regards, Peter
So you may investigate source line - which caused highCPI value (small instruction retired, big CPU cycles spent). For MKL functions, they are well performance tuned functions...You only need toensure if you used them in right usage mode.
You mayuse Concurrency Analysis to know parallelsimof your program, work balance onthreads, cores' utilization, etc.
You may use LocksAndWaits Analysis to know wait time, which may cause stalls between threads.
Regards, Peter
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
"..I am not able to understand the timing information obtained." - Thetime is shown onreport, was calculated by using this formula:
"CPU Unhalted Cycles" Event / CPU Frequency
Overhead of profiling timewas not considered, I guess.
"CPU Unhalted Cycles" Event / CPU Frequency
Overhead of profiling timewas not considered, I guess.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks Peter,
I think i need to take a relook at the algorithm i am using. As you sugested, i will do the other analysis and see if there are any issues that may be bottlenecks.
thanks
Reddy
I think i need to take a relook at the algorithm i am using. As you sugested, i will do the other analysis and see if there are any issues that may be bottlenecks.
thanks
Reddy
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page