- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am trying to factorize an indefinitematrix using Pardiso with an LDL' factor. I am working on solving an eigenvalue problem (structural dynamics) where Pardiso should do the matrix algebra in the reverse communication interface for ARPACK. The LDL' factor is necessary so that I know how many eigenvalues exist within a predefined range and is an input to ARPACK. The matrix is indefinite because I have constraint equations in the problem using lagrange multipliers which places zeros on the diagonal, and the matrix is partitioned so that the zeros are in the last rows of the matrix. The iparm[] and other settings i have used are identical to those in the pardiso_sym_c.c example except that I set mtype=-2. I am using the ILP64 interface, and pardiso is defined to return an MKL_INT just like in the example.
Every time I try to run the calculation, I recieve an exception fault with phase=11. If I set iparm[26]=1 (C indexing), the analyze completes without an exception fault, however the next phase=22 again quits with an exception fault.
I have run this calculation on the same matrix using Cholmod (in this case without lagrange multipliers because Cholmod only solves Positive definite matrices), Ma57 from the HSL library, SuperLU, and CXSparse (Univ. Florida). All these have worked, and so I dont understand what I am not doing correctly. It is difficult to troubleshoot a problem like this without some kind of diagnostic informationfrom the MKL.
Any help would be appreciated, Thanks,
john
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks again for your reply. In the mean time I have manage to get Pardiso running. Turn out that the problems were,
(1) I was using (0) based ia,ja array indexing insteda of (1)
(2) I assemble my FE matrices in column format, hence I have to convert them to row format for Pardiso. I still maintain that there is a problem with the MKL conversion routine mkl_dcsrcsc(). I have written my own for the time being. If you are interested in the algorithm, let me know.
PS: I have been varying some of the parameters to fine tune the performance in the Pardiso solve. I have compared the performance to ma57 (HSL). The factor times are roughly equivalent between the two, however the solve time with Pardiso causesan overall solution time of about 50% longer. The solves occur in the reverse communication loop of ARPACK where 15 natural frequencies and mode vectors are extracted with the matrix order n ~400,000 and number of non-zeros about 16 million.
I also notices thatthe number of threads being used was equal to 5 regardless of what I set iparm(3) to (I have a quad core Xenon processor with 16 GB ram, running on windows XP). Do you think there ispossibly a performance issue with this ?
The matrix has zeros on the diagonal as well (symmetric indefinite mtype=-2), can the issue have something to do with iterative refinement ?
Thanks, john
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks very much for your quick response. I am only solving 1 RHS. Note additionally, I just linked against mkl_sequential.lib and repeated the calculations I had done using the threading libaries mkl_intel_thread.lib, libiomp5md.lib. I compile using the intel c compiler, C99, with -MTd, optimimizations -O2, -Ot. I was quite surprised to see that with only a single thread in this calculation that the total compuation time was 459 secs. With 5 threads, the calc time was 429 secs. I have a quad core cpu. There must be something I am doing wrong ?
Thanks, john
ps: I have to locate the thread about the conversion routine, Ill get back to you later.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I just performed a calculation of the same problem using an LU factorization, mtype=11. This is possible even though the matrix has 12 zeros on the diagnonal of ~400,000 equations because it has an LU factor due to its principal minors, see wiki if necessary. This time i ran the problem on my dual core at home.
The result was a total solution time of 259 sec, and btw. the calculation ran with only one thread, not with the number of cpus, so there is a discrepany with regard to your statement about iparm(3). The link was done with the parallel libraries.
I am of course very happy with this performance, its even better than that which I achieved with ma57. I would definetely like to understand what the problem is with the lack of performance using multi-threading on that quad core cpu, and the indefinte solver LDL'.
regs, john
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sorry to overload you with replies. I just realize that my unsymmetric matrix calc was actually done with the sequential libraries. I re-linked with the parallel versions and ran the calc again. This time 3 threads on my dual core cpu were active. The result took 288 secs which is actually an increase inspite of using the parallel libraries ??
regards,john
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi John,
in general, if we are talking about solution phase, the performance should be the same because of this phase is not threaded. More precisely, the solution phase is treaded only for many RHS. And this performance results, of course, will not depends on which compiler options were used.
--Gennady- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
John,
The total solution time what do you mean by that? Is this the all execution time for all calculation phases, say for all phase==(11 + 22 + 33) ?
--Gennady
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
john
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Gennady, I was just doing some reading in the Pardiso manula (university of Basel) and read the following,
(o) Reproducibility of exact numerical results staon multi-core architectures. The solver is now able to compute the exact bit identical solution independent
on the number of cores without effecting the scalability. Here are some
results for a nonlinear FE model with 500'000 elements.
Intel MKL PARDISO 10.2
1 core - factor: 17.980 sec., solve: 1.13 sec.
2 cores - factor: 9.790 sec., solve: 1.13 sec.
4 cores - factor: 6.120 sec., solve: 1.05 sec.
8 cores - factor: 3.830 sec., solve: 1.05 sec.
U Basel PARDISO 4.0.0:
1 core - factor: 16.820 sec., solve: 1.09 sec.
2 cores - factor: 9.021 sec., solve: 0.67 sec.
4 cores - factor: 5.186 sec., solve: 0.53 sec.
8 cores - factor: 3.170 sec., solve: 0.43 sec.
This method is currently only working for symmetric indefinite matrices.
This seems to be consistent with what I am experiencing. Do we get updated versions of Pardiso from Basel ?
regards, john
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page