- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

I have tried manually implementing this calculation, by multiplying row vectors/blocks of A' by A and storing these in the corresponding blocks of B, however depending on the block size the overhead due to multiple calls can even lead to a decrease in performance (very small blocks) or a gain in performance < 50%.

Alternatively, what would the optimal block size be to reduce the overhead in multiple calls, and spinning up threads? Is any information available on how the algorithm partitions the data into multiple threads internally?

1 Solution

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Could you please check whether the DSYRK function in MKL BLASwill work for you? Here is an excerpt from the MKL Reference Manual:

The ?syrk routines perform a matrix-matrix operation using symmetric matrices. The operation is defined as

C := alpha*A*A' + beta*C,

or

C := alpha*A'*A + beta*C

Thank you,

Efe

Link Copied

3 Replies

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Could you please check whether the DSYRK function in MKL BLASwill work for you? Here is an excerpt from the MKL Reference Manual:

The ?syrk routines perform a matrix-matrix operation using symmetric matrices. The operation is defined as

C := alpha*A*A' + beta*C,

or

C := alpha*A'*A + beta*C

Thank you,

Efe

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

You're right,the documentationcould be more descriptive. Thank you for the input.

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page