- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi all
I'm having trouble with matrix multiplication on Sparse Blas. I am trying to multiply 2 huge matrices and compare multithreaded and single threaded performance on an quad core AMD Phenom II 940 with 4GB of DDR3 RAM.
I am using mkl_scsrmm. On benchmark, I repeat the call to mkl_scsrmm a hundred times and I compute the total time in seconds. The matrices have 700,000 (dense) and 20,000 (sparse) elements.
The problem is that the multithreaded performance is only 5% better than single threaded sequential performance.
What is happening?
Ashade
Link Copied
2 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
With such benchmarks, the devil is in the details. You may be interested to see what Amdahl's Law states regarding your runs.
If you want more specific help, you will have to document what you did, and state reasons why the results are not what was expected.
If you want more specific help, you will have to document what you did, and state reasons why the results are not what was expected.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Ashade,
Do you mean that you see 5% of scalability with 4 cores vs 1 core?
what is the matrixes size?
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page