- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I posted my question in Fortran forum and then I realized probably I should post it here. Any inputs are welcome. Thanks.
Link Copied
7 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
There are some test results for sqrt calculations in:
Forum Topic: Performance of sqrt
Web-link: software.intel.com/en-us/forums/topic/364460
and take a look at SqrtTestApp project.
Regarding performance of matrix multiplication:
It can not be improved significantly by applying some instruction set, like SSE, SSE2, AVX, AVX2, etc, or with a sophisticated cache management. A high performance matrix multiplication should rely on:
- Advanced algorithm, like
Strassen - O( n^2.8070 )
Strassen-Winograd - O( n^2.8070 )
Kronecker based ( Tensor Product ) - I don't have an exact asymptotic complexity, it is about ~O( n^2.5 ) ( really fast! )
Coppersmith-Winograd - O( n^2.3760 )
Virginia Vassilevska Williams - O( n^2.3727 )
- Fastest instruction set available on a given CPU
- Multi-threading
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>>...A high performance matrix multiplication should rely on:
>>
>>- Advanced algorithm, like
>>
>>Strassen - O( n^2.8070 )
This is a follow up and I'd like to provide results of some tests that demonstrate how performance of the algorithm is improving ( it is still CPU-bound! ) when one parameter ( Matrix Size Threshold ) is changed:
[ Strassen HBC ( Heap Based Complete ) - All Optimizations disabled ]
Matrix Size : 4096 x 4096
Matrix Size Threshold: 2048 x 2048
Matrix Partitions : 8
Strassen HBC - Pass 1.1 - Completed: 1057.00098 secs
Strassen HBC - Pass 1.2 - Completed: 1056.40698 secs
Strassen HBC - Pass 1.3 - Completed: 1056.48596 secs
Matrix Size : 4096 x 4096
Matrix Size Threshold: 1024 x 1024
Matrix Partitions : 57
Strassen HBC - Pass 2.1 - Completed: 641.61700 secs
Strassen HBC - Pass 2.2 - Completed: 641.00800 secs
Strassen HBC - Pass 2.3 - Completed: 641.16400 secs
Matrix Size : 4096 x 4096
Matrix Size Threshold: 512 x 512
Matrix Partitions : 400
Strassen HBC - Pass 3.1 - Completed: 267.86899 secs
Strassen HBC - Pass 3.2 - Completed: 266.73001 secs
Strassen HBC - Pass 3.3 - Completed: 266.74701 secs
Matrix Size : 4096 x 4096
Matrix Size Threshold: 256 x 256
Matrix Partitions : 2801
Strassen HBC - Pass 4.1 - Completed: 165.18900 secs
Strassen HBC - Pass 4.2 - Completed: 162.69299 secs
Strassen HBC - Pass 4.3 - Completed: 162.69400 secs
Matrix Size : 4096 x 4096
Matrix Size Threshold: 128 x 128
Matrix Partitions : 19608
Strassen HBC - Pass 5.1 - Completed: 147.88901 secs
Strassen HBC - Pass 5.2 - Completed: 142.99001 secs
Strassen HBC - Pass 5.3 - Completed: 142.99100 secs
Matrix Size : 4096 x 4096
Matrix Size Threshold: 64 x 64
Matrix Partitions : 137257
Strassen HBC - Pass 6.1 - Completed: 129.74600 secs
Strassen HBC - Pass 6.2 - Completed: 119.79310 secs
Strassen HBC - Pass 6.3 - Completed: 119.79300 secs Note: Best time A ( BtA )
Matrix Size : 4096 x 4096
Matrix Size Threshold: 32 x 32
Matrix Partitions : 960800
Strassen HBC - Pass 7.1 - Completed: 797.85199 secs Note: Virtual Memory bound / Performance decreased
Strassen HBC - Pass 7.2 - Completed: 127.11000 secs
Strassen HBC - Pass 7.3 - Completed: 122.94400 secs
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
A C++ compiler optimizations could significantly improve performance and here is an example:
[ Strassen HBC ( Heap Based Complete ) - All Optimizations disabled - /Od ]
Matrix Size : 4096 x 4096
Matrix Size Threshold: 64 x 64
Matrix Partitions : 137257
Strassen HBC - Pass 6.1 - Completed: 129.74600 secs
Strassen HBC - Pass 6.2 - Completed: 119.79310 secs
Strassen HBC - Pass 6.3 - Completed: 119.79300 secs Note: Best time A ( BtA )
[ Strassen HBC ( Heap Based Complete ) - Optimization Maximize Speed enabled - /O2 ]
Matrix Size : 4096 x 4096
Matrix Size Threshold: 64 x 64
Matrix Partitions : 137257
Strassen HBC - Pass 1.1 - Completed: 45.75500 secs
Strassen HBC - Pass 1.2 - Completed: 35.78600 secs Note: Best time B ( BtB ) and it is ~3.4x faster than BtA!
Strassen HBC - Pass 1.3 - Completed: 35.78700 secs
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Sergey
Is it possible to post the source code for Strassen multiplication algorithm?
Thanks in advance.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
No because Strassen Heap Based Complete ( HBC ) is a proprietary algorithm. If you search the Internet you could find sources in C, Java, PHP languages for Strassen classic ( Stack Based ) algorithm.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>>>If you search the Internet you could find sources in C>>>
I have found already a plenty of code examples.

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page