- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
Using Vtune Amplifier concurrency analysis on an example code of dgemm (link here), the overhead and spin time surprisingly covered almost 100% of the CPU usage bar! (reported here). I tried VTune concurrency profiling tool for sparse matrix by vector multiplication kernel mkl_dcsrsymv as well, and similar result was obtained. Since in the examples mentioned here, a very high performance is achieved, the large overhead reported seems irrelevant. I initially asked for an explanation in VTune Amplifier forum (here) and I was advised to ask the question in this forum.
Do you have any explanation for the large overhead and spin time?
Cheers,
note: Vtune Amplifier update 11, Intel Composer XE 2013 are used.
Link Copied
- « Previous
-
- 1
- 2
- Next »
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
MKL function was treated as system function in Vtune. The overhead and spin times are actually from MKL computing kernel function. it is computing time, but was marked as overhead time.
Thanks Ying, that totally makes sense!!
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- « Previous
-
- 1
- 2
- Next »