Showing results for

- Intel Community
- Software Development SDKs and Libraries
- Intel® oneAPI Math Kernel Library & Intel® Math Kernel Library
- Symmetric sparse matrix - dense matrix multiplication

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

Highlighted
##

Donghyuk_S_

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

02-27-2014
04:07 PM

12 Views

Symmetric sparse matrix - dense matrix multiplication

I need to multiply a symmetric sparse matrix A with a dense matrix X (Y = A*X) using multi-thread/core. The matrices I'm using are the adjacency matrix of graphs, with large number of nodes (up to 2 million nodes).

I have tried two approaches:

- mkl_dcsrmm() with matdescra[0] set to 's'.
- mkl_dcsrsymv() in a for-loop, looping over the column vectors of X. Below is the code I used.

#pragma omp parallel for schedule(static) for(int i=0; i<n; i++) { mkl_dcsrsymv(&matdescra[1], &m, values, rowIndex, columns, X, Y); }

Initially, I thought that the first option (Sparse BLAS level 3) should be faster than the second one. But, I'm getting the opposite timing results.

Below is an example of a symmetric sparse matrix A with about 1.7M rows/columns and 42M non-zero entries and a dense matrix X with the same number of rows and 100 columns. Running on number of threads set to 2, 4, and 8, respectively.

- option1: 19.17sec, 9.38sec, 5.20sec
- option2: 13.26sec, 6.83sec, 3.84sec

Is there any particular reason for this or am I missing something? Because, it seems that mkl_dcsrmm() should be doing things more efficiently than my for-loop.

I compiled the code with the following command; icpc -mkl=parallel -I$(MKLROOT)/include -O3 -openmp -o test test.cpp -L$(MKLROOT)/lib/intel64 -lmkl_intel_lp64 -lmkl_core -lmkl_intel_thread -lpthread -lm

4 Replies

Highlighted
##

Victor_Gladkikh

New Contributor I

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

02-28-2014
02:38 AM

12 Views

Hi

I have two questions.

Could you provide your hardware information?

Do you store full matrix(lower + upper part), or store only lower, or story only upper part of the matrix?

Best regards,

Victor

Highlighted
##

Donghyuk_S_

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

02-28-2014
06:30 AM

12 Views

The hardware I'm running on is a server with two Xeon E5-2680 processors (2.7GHz 8 cores/16 threads) and 32GB of memory.

I only store the upper triangular part of the matrix.

Highlighted
##

Ying_H_Intel

Employee

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

03-03-2014
07:04 PM

12 Views

Hi Donghyuk,

Thank you for the reply. It may be a bug. I have escalated it to our developer team. Will update if any updates.

And if it is conveniet for you, a test case (include test vectors) would be helpful. You can send it by send author a message if the test case is private protected.

Thanks

Ying

Highlighted
##

Ying_H_Intel

Employee

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

04-26-2015
06:46 PM

12 Views

Hi Donghyuk,

I'm glad to notify you that we add new API and modify some initialized code on the sparse matrix function in MKL 11.3 beta and MKL 11.2.3. The performance should be keep consistent. You are welcomed to try them.

I will send the beta invitation letter to you by private message.

Thanks

Ying

For more complete information about compiler optimizations, see our Optimization Notice.