- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

I need to multiply a symmetric sparse matrix A with a dense matrix X (Y = A*X) using multi-thread/core. The matrices I'm using are the adjacency matrix of graphs, with large number of nodes (up to 2 million nodes).

I have tried two approaches:

- mkl_dcsrmm() with matdescra[0] set to 's'.
- mkl_dcsrsymv() in a for-loop, looping over the column vectors of X. Below is the code I used.

#pragma omp parallel for schedule(static) for(int i=0; i<n; i++) { mkl_dcsrsymv(&matdescra[1], &m, values, rowIndex, columns, X, Y); }

Initially, I thought that the first option (Sparse BLAS level 3) should be faster than the second one. But, I'm getting the opposite timing results.

Below is an example of a symmetric sparse matrix A with about 1.7M rows/columns and 42M non-zero entries and a dense matrix X with the same number of rows and 100 columns. Running on number of threads set to 2, 4, and 8, respectively.

- option1: 19.17sec, 9.38sec, 5.20sec
- option2: 13.26sec, 6.83sec, 3.84sec

Is there any particular reason for this or am I missing something? Because, it seems that mkl_dcsrmm() should be doing things more efficiently than my for-loop.

I compiled the code with the following command; icpc -mkl=parallel -I$(MKLROOT)/include -O3 -openmp -o test test.cpp -L$(MKLROOT)/lib/intel64 -lmkl_intel_lp64 -lmkl_core -lmkl_intel_thread -lpthread -lm

Link Copied

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Hi

I have two questions.

Could you provide your hardware information?

Do you store full matrix(lower + upper part), or store only lower, or story only upper part of the matrix?

Best regards,

Victor

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

The hardware I'm running on is a server with two Xeon E5-2680 processors (2.7GHz 8 cores/16 threads) and 32GB of memory.

I only store the upper triangular part of the matrix.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Hi Donghyuk,

Thank you for the reply. It may be a bug. I have escalated it to our developer team. Will update if any updates.

And if it is conveniet for you, a test case (include test vectors) would be helpful. You can send it by send author a message if the test case is private protected.

Thanks

Ying

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Hi Donghyuk,

I'm glad to notify you that we add new API and modify some initialized code on the sparse matrix function in MKL 11.3 beta and MKL 11.2.3. The performance should be keep consistent. You are welcomed to try them.

I will send the beta invitation letter to you by private message.

Thanks

Ying

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page