- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am trying to implement a recursive matrix multiplication on Xeon Phi. I have two implementation . The first one I have my own implementation of Strasseen and it is working fine when I call it for more than one level of recursion the time is decreased one I increase the recursion level. To boost My algorithem I used the cblas_dgemm MKL function for submatrix multiplication I call it from Strassen Algorithem. The problem is that I the time increased when I increase the level of recursion. what is the problem
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Without seeing your code, whatever I say will, at best, be an educated guess.
As I recall, Strassen will multithread but not vectorize. The MKL routines try to both vectorize and multithread if possible. Perhaps as you increase the number of levels of Strassen then call dgemm, you are creating too many threads. But, as I say, that is just a guess.
You might want to try Intel(r) VTune(r) AmplifierXE (http://software.intel.com/en-us/ARTICLES/OPTIMIZATION-AND-PERFORMANCE-TUNING-FOR-INTEL-XEON-PHI-COPROCESSORS-PART-2-UNDERSTANDING might give you some idea of what to look for) or post some sample code for us to try out.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page