Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Intel Community
- Software Development SDKs and Libraries
- Intel® oneAPI Math Kernel Library & Intel® Math Kernel Library
- MKL 6.1 - slow BLAS (zgemm) performance with small matrices

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

AndrewC

New Contributor I

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

05-25-2004
01:15 AM

65 Views

MKL 6.1 - slow BLAS (zgemm) performance with small matrices

I understand that MKL would be much faster when the matrices get larger ( and in other parts of my code does help quite a lot), but perhaps the MKL engineers could look at some way to avoid time consuming set-up etc and skip to simple serial code when matrices are 'small'

Link Copied

2 Replies

Gregory_H_Intel

Employee

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

05-25-2004
04:46 PM

65 Views

It is true that our focus is on much larger matrices. We have made a very significant effort in MKL 7.0 (which should be available shortly) to focus on the small matrix case as well. In this case, we do tend to use different and simpler algorithms in the small matrix case. For the most part, we do tend to use hybrid algorithms which involve special cases for when things are too small. But usually we do try matrix multiply strategies in DGEMM before we try them on ZGEMM. I suspect that DGEMM in 7.0 will respond better to 6x6 matrices. We would certainly like to find the best solution for all cases.

The best algorithms for large matrices tend to have enormous overheads for small matrices. On a 6x6 matrix for example, the interface itself seems to consume half the time (that is, one could dosignificantly better than netlib BLAS simply by inlining 6x6x6 loops). Having a malloc and some of the other tricks we use certainly doesn't help.

I think your idea is a good one. I will bring it to the attention of the rest of the developers. Unfortunately, it is too late for 7.0, and possibly 7.0.1. I honestly do not know when we will address your specific concern, but I can assure you that we will continue to improve the small matrix cases. It is very much a topic of our attention, as you will see by comparing small DGEMMs between 6.1 and 7.0 when it is released.

Thank you again for your suggestion.

- Greg Henry

AndrewC

New Contributor I

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

05-25-2004
05:18 PM

65 Views

I am thrilled to hear that I will see improvements in MKL 7.0 for small matrices. I will download the latest 7.0 beta and try it out.

Thanks for your reply.

Andrew

For more complete information about compiler optimizations, see our Optimization Notice.