Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Intel Community
- Software
- Software Development SDKs and Libraries
- Intel® oneAPI Math Kernel Library
- A'*B using mkl_dcscmm

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

Zaiwen

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

02-09-2014
04:44 AM

124 Views

A'*B using mkl_dcscmm

I tried mkl_dcscmm to compute both A*B and A'*B using a Matlab mex file (64-bit Linux, Matlab 2013a and 2013b) similar to the code posted in

http://software.intel.com/en-us/forums/topic/472320

MKL is faster than matlab's own implemention on A*B. It is strange that MKL is slower than matlab's version on A'*B and the results are slightly different.

(the first column of cpu is from matlab's implementation and the second column is from MKL)

seed: 76080079, A*B: err 0.00e+00, cpu (0.91, 0.44), A'*B: err 1.43e-09, cpu (0.76, 0.71)

seed: 66432737, A*B: err 0.00e+00, cpu (0.91, 0.43), A'*B: err 1.43e-09, cpu (0.75, 0.79)

seed: 90643494, A*B: err 0.00e+00, cpu (0.92, 0.45), A'*B: err 1.43e-09, cpu (0.77, 0.88)

seed: 75317986, A*B: err 0.00e+00, cpu (0.94, 0.46), A'*B: err 1.45e-09, cpu (0.75, 0.82)

seed: 31023079, A*B: err 0.00e+00, cpu (0.92, 0.42), A'*B: err 1.43e-09, cpu (0.75, 0.80)

seed: 86467634, A*B: err 0.00e+00, cpu (0.94, 0.48), A'*B: err 1.44e-09, cpu (0.76, 0.86)

seed: 19834911, A*B: err 0.00e+00, cpu (0.93, 0.61), A'*B: err 1.42e-09, cpu (0.78, 0.76)

seed: 79273667, A*B: err 0.00e+00, cpu (0.93, 0.48), A'*B: err 1.43e-09, cpu (0.75, 0.82)

seed: 11976366, A*B: err 0.00e+00, cpu (0.93, 0.45), A'*B: err 1.42e-09, cpu (0.78, 0.89)

seed: 16420430, A*B: err 0.00e+00, cpu (0.92, 0.40), A'*B: err 1.43e-09, cpu (0.75, 0.80)

My codes are attached. It can be compiled as

mex -O -largeArrayDims -output sfmult mkl-sfmult-v1.cpp

A*B and A'*B can be computed as sfmult(A, B, 1) and sfmult(A, B, 2), respectively.

Although A'*B can also be computed as sfmult(A', B, 1) by first doing the transpose, it is better to provide the A matrix and use the flag of transpose inside mkl_dcscmm.

Any suggestion or comment is welcome. Thanks!

Link Copied

3 Replies

TimP

Black Belt

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

02-09-2014
09:53 AM

124 Views

Performance differences look small enough that they could be due to any of several factors:

1) apparently, you didn't invoke auto-vectorization. Even gprof ought to show whether that makes a difference.

2) differences (possibly accidental) in data alignment or total cache usage

.....

Zaiwen

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

02-10-2014
05:20 PM

124 Views

*) Why the correctness depends on auto-vectorization and data alignment or total cache usage?

*) The error of A'*B can become larger if the size of the matrix increases. But the results of A*B are the same as these computed by Matlab.

*) Since these two operations have to be called tens to hundreds of times in my application, the performance differences can be quite large. Hence, I hope to first figure out the reason in some simple random examples.

I am wondering if there is a bug in mkl_dcscmm when the transpose of A is used.

VipinKumar_E_Intel

Employee

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

03-05-2014
09:20 PM

124 Views

Please refer the article on the MKL feature called Conditional Numerical Reproducibility to get more details on causes of incorrect results .

From MKL 11.1 onwards, we also support CNR mode on unaligned data. Can you try MKL 11.1 and see you still see the problem?

--Vipin

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

For more complete information about compiler optimizations, see our Optimization Notice.