Hello, I ran a lightwieight hotspot analysis on my code. I get the result attached as csv file. Can you please help me with pointers to what i can do now to improve the speed of the program. Major portion of the time is spent in zgemm3m for amtrix multiplications and matrix inverse using zgesv (or getrf and getri ). I am not able to understand the timing information obtained.
Thanks for your results of lightweight-hotspots. Usually you can identify performance issue based onCPI value on tophot functions, the smaller the better. Howeversome function which used SSE3/SSE4//AVX instructions, willhas big CPI value - itis reasonable(single instruction, multiple data)
So you may investigate source line - which caused highCPI value (small instruction retired, big CPU cycles spent). For MKL functions, they are well performance tuned functions...You only need toensure if you used them in right usage mode.
You mayuse Concurrency Analysis to know parallelsimof your program, work balance onthreads, cores' utilization, etc.
You may use LocksAndWaits Analysis to know wait time, which may cause stalls between threads.