Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Intel Community
- Software
- Software Development SDKs and Libraries
- Intel® oneAPI Math Kernel Library
- multiple right hand side

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

apocalx

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

06-22-2011
09:40 AM

280 Views

multiple right hand side

When I cmpare time to solve Ax=b with Pardiso, I get slower time with multiple right hand side than in single rhs.

I made many tests with different matrix size and number of non zero values and I get always same behavior.

Matrix is symmetrical

Matrix and vector B are fill with complex number

Matrix size : 10000x10000

Number of non zero value per row : 3

All non zero values are on the diagonal and near the diagonal

Vector B is fill with 1+j0 at each index. (to simplify the test)

Reordoring and Factorisation are runned only 1 time.

Time to solve 10000 times the system with 1 rhs : 4.485 sec.

Time to solve 10000/8 times the system with 8 rhs: 15.89 sec.

Time to solve 10000/32 times the system with 32 rhs: 7.672 sec.

Time to solve 10000/128 times the system with 128 rhs: 6.89 sec.

Is it normal that the faster result is always wit 1rhs no matter the matrix size, the number of non zero and values in B vector ?

I'm on windows XP, visual studio 2005 and 2010

Intel Core 2 Quad CPU Q9550 @ 2.83GHz 3.25 GB RAM

Link Copied

21 Replies

Chao_Y_Intel

Employee

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

06-23-2011
01:39 AM

254 Views

Hello,

From the tests, the result seems fine. The test repeated on the same data set. The memory access performance is important on the solver stage. If you only have one right hand date, this data will be easier to keep into the cache after the first run, which will have better performance, than it has more multiple right hands data. I am checking with the function owner for more details on this.

Thanks,

Chao

apocalx

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

06-27-2011
10:19 PM

254 Views

In fact, my problem is the same than Bosun Hwang posted last year (http://software.intel.com/en-us/forums/showthread.php?t=76809&o=a&s=lr) but with MKL version 10.3 update 3.

So, is it normal than many right hand side was slower than 1 rhs in mkl 10.3 update 3?

Thanks?

MarcB

Sergey_Solovev__Inte

New Contributor I

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

06-28-2011
04:31 AM

254 Views

SergeyS.

apocalx

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

06-28-2011
08:26 AM

254 Views

n solve * (1 rhs) is always faster thann/k solve * (k rhs)

This logical is true with k <= 256. Bigger than that, I have some memory problems.

where"k" is the number of right hand side

In my mind, multiple rhs should be faster than 1 rhs, no?

MarcB

apocalx

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

07-01-2011
08:25 AM

254 Views

Because, with my conclusion of multiple rhs, I don't see interest to use it instead of many single rhs.

Thanks

MarcB

Sergey_Solovev__Inte

New Contributor I

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

07-04-2011
04:13 AM

254 Views

apocalx

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

07-04-2011
01:40 PM

254 Views

Times seem better with a big number of non zero values.

Here are some benchmark result with symmetrical matrix and complex number :non zero values per line = number of non zero values in diagonal and upper triangle per line

Matrix size 5000 x 5000 with 2 non zero values per line :

rhs = 1 => 1.172 sec.

rhs = 2 => 10.515sec.

rhs = 4 => 5.718sec.

rhs = 8 => 3.000sec.

rhs = 16 => 2.188sec.

rhs = 32 => 1.593sec.

rhs = 64 => 1.469sec.

rhs = 128 => 1.563sec.

Conclusion : Slower with multiple rhs than single rhs no matter number of rhs

Matrix size 5000 x 5000 with 10 non zero values per line :

rhs = 1 => 9.734 sec.

rhs = 2 => 33.140 sec.

rhs = 4 => 19.734 sec.

rhs = 8 => 15.687 sec.

rhs = 16 => 13.203 sec.

rhs = 32 => 12.406 sec.

rhs = 64 => 12.531 sec.

rhs = 128 => 15.734 sec.

Conclusion : Slower with multiple rhs than single rhsno matter number of rhs

Matrix size 5000 x 5000 with 100 non zero values per line :

rhs = 1 => 116.248sec.

rhs = 2 => 126.904sec.

rhs = 4 => 89.264sec.

rhs = 8 => 72.514sec.

rhs = 16 => 67.468sec.

rhs = 32 => 60.061sec.

rhs = 64 => 58.186sec.

rhs = 128 => 90.030sec.

rhs = 256 => 117.185sec.

Conclusion : Faster with multiple rhs than single rhs, but multiple rhs must be <= 64 and != 2

Matrix size 20000 x 20000 with 50 non zero values per line :

rhs = 1 => 1143.416sec.

rhs = 1 => 1143.416sec.

rhs = 2 => 1451.363sec.

rhs = 4 => 955.747sec.

rhs = 8 => 789.063sec.

rhs = 16 => 695.283sec.

rhs = 32 => 632.784sec.

rhs = 64 => 626.066sec.

rhs = 128 => 831.969sec.

Conclusion : Faster with multiple rhs than single rhs, but multiple rhs must be <= 64and != 2

The more nnz values per line are in matrix, faster is multiple rhs solving.

Unfortunately, in my case all my matrices are highly sparse. (Between 3 and 10 non zero values per line no matter the matrix size) and I have to solve it with many different B, so I need good performance with multiple right hand side with small number of nnz per line.

Thanks

MarcB

Sergey_Solovev__Inte

New Contributor I

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

07-04-2011
09:06 PM

254 Views

Could you provide us with info about you? We will create request and add this info into this request.

It would be nice if you provide us with your test case. We will add it in request as well.

Sergey Solovev.

apocalx

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

07-06-2011
07:18 AM

254 Views

Name: Marc Belletete

Company: Cyme

Email: marc.belletete@cyme.com

Envioronment: Visual Studio 2010

I attached a benchmark .cpp file.

Here is another subject :

All our matrices are highly sparse (between 3 and 10 nnz values per line) and almost all nnz values are on diagonal or very close of diagonal.

With this kind of matrix, I saw that KLU solver ( non symmetrical ) is, maybe, 1.5x faster than intel mkl pardiso (symmetrical).

However, KLU have problems that Pardiso don't have (problem to solve singular matrix).

I would like your opinion about that.

Note: KLU is a free solver.

Thanks

MarcB

Gennady_F_Intel

Moderator

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

07-06-2011
08:57 AM

254 Views

Which comparative resultsyou have withthese solvers (KLU vs Pardiso)?

apocalx

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

07-06-2011
11:33 AM

254 Views

My matrices were symmetrical, so I used Pardiso symmetrical and KLU non-symmetrical (not sure that KLU symmetrical exist). The number of non zero values were between 3 and 10 per line). Almost all my non zero values were on the diagonal and close to the diagonal.

MarcB

Sergey_Solovev__Inte

New Contributor I

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

07-07-2011
02:54 AM

254 Views

Thanks for detailed info!

We added it into our request.

We added it into our request.

Andrey_Kuzmin__Intel

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

08-25-2011
03:31 AM

254 Views

Hello Marc,

I did investigation of the problem and found much better instrument for this kind of matrices.

The reason of relatively poor performance of multiple RHS solve is simple.

We have two types of parallelism inside PARDISO solving step. The first is used for problems with 1 RHS (the work for the only RHS is split between cores) and the second is used for systems with many RHS (many RHS are solved in parallel, but the sequential algorithm is used for each of them). There is no mixed version for the problems with few RHS. For instance, if we have 2 right hand sides and 4 cores, the problem will be solved in parallel using 2 cores, but 2 cores will remain idle.

The second reason of poor performance for such matrices is that amount of work for each RHS is too small. We dont even compensate the threading overhead. Indeed, the sequential version of PARDISO is faster than parallel for such problems (set OMP_NUM_THREADS=1 to check it). If we increase the number of RHS or nnz per line, the parallel version shows better performance as you mentioned.

In fact, matrices of size 10000x10000 are considered to be very small for PARDISO. If you always have all non-zero elements very close to diagonal, I would recommend you to use MKL LAPACK solver with band matrix storage scheme (zgbtrf and zgbtrs stay for factorization and solve correspondingly). Unfortunately, there is no storage scheme in LAPACK that takes advantage of both banded structure and symmetry (the latter has to be sacrificed), but the memory consumption seems not to be critical here. I rewrote the benchmark so that it does exactly the same computations, but uses these two functions instead of PARDISO (see attachment). It demonstrates much higher performance (at least few times faster, but sometimes even an order of magnitude faster). In fact there is the same issue with 2 RHS on 4 cores, but overall results situation is much better.

Best regards,

Andrey Kuzmin.
Andrey_Kuzmin__Intel

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

08-25-2011
03:34 AM

254 Views

apocalx

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

09-06-2011
12:28 PM

254 Views

In my case, maybe 90% of non zeros values are on diagonal or near the diagonal. The other 10% are anywhere in the upper triangle. (number of non zero values is still about 7 per lines)

Is it correct if I use LAPACK equivalent functions, with Packed storage or Full Storage?

With this kind of storage (instead of Band storage), do multiple rhs will give me interesting time? (Faster the pardiso and comparable to band storage with multiple rhs)?

Thanks

Marc

Andrey_Kuzmin__Intel

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

09-07-2011
10:46 PM

254 Views

Marc,

I see your question andwill investigate it soon.

Sincerely,

Andrey.

apocalx

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

09-13-2011
06:12 AM

254 Views

With MKL, I want to solve Ax=b with many b with the same A where A is a higly sparse matrix (symmetrical or not).

1- I can't use MKL LAPACK functions because LAPACK don't have a schematic for sparse matrix, and it will take too much memory to have a schematic of all values in my matrix (including all zero values)

2- I have to use Pardiso to factorize only 1 time de matrix and to solve it many time aftre that. (work fine but slow)

3- I can use PARIDSO multiple right hand side to solve but it will be slower than solving 1 right hand side a time for reason explain in this post.

4- In my case, I only need some values in x solution vector (position that I need change for each solve). I can use PARDISO partial solution, but it will be slower than solve all values in x vector because I need to re-factorize the matrix for each solve if I use partial solution.

So, I don't know if there is another solution for my kind of situation?

I try some other solver andI don't know how KLU solver work, but if I try to solve the same system :

1 factorization

solve many B vector

1 right hand side a time

Using only 1 processor (no parallelisme)

KLU is 3 times faster then Pardiso for this king of matrix. (http://www.cise.ufl.edu/research/sparse/klu/)

I'm waiting your comments

Thanks a lot for your time

Marc

basel

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

09-13-2011
07:01 AM

254 Views

>4- In my case, I only need some values in x solution vector (position that I need change for each solve). I can >use PARDISO partial solution, but it will be slower than solve all values in x vector because I need to re-factorize >the matrix for each solve if I use partial solution.

You should try to use the release 4.1.3 from www.pardiso-project.org.

We have recently implemented a feature that exactly is a good solution for (4).

Olaf

Alexander_K_Intel2

Employee

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

09-13-2011
09:12 PM

254 Views

About 4th question: Am I right that numbers of components solution vector that you need to find using PARDISO differ from step to step? I've asked it because if these numbers doesn't change you could use partial solution in pardiso solver with only one call factorization and reordering steps.

With best regards,

Alexander Kalinkin

apocalx

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

09-14-2011
06:01 AM

41 Views

Marc

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

For more complete information about compiler optimizations, see our Optimization Notice.