DSS clarifications

Scott_Mcmichael · ‎11-06-2009

"Can anyone help with these DSS questions? I've been searching through the manual with no luck:

- What exactly do each of the message and termination level options mean?

- The "dss_create" documentation specifies that input data and internal arrays must have single precision. Since "dss_solve" accepts an array of double precision values, what exactly does that stipulation mean?

- What conditions can cause the "dss_reorder" function to return a value of MKL_DSS_FAILURE (-3)? This call used to cause my code to crash with the same error code until I tried setting MKL_DSS_TERM_LVL_FATAL. I have been able to get this call to work using simple test matrices and I am calling it using the option MKL_DSS_AUTO_ORDER.

- Relating to the previous question, are there any requirements I might be missing for the matrix format? I am using a CSR formatted symmetric sparse matrix conforming to the sparse matrix storage format in the manual. "dss_define_structure" returns without error and I have checked for several kinds of errors in the row and column specifications.

- Does "dss_delete" need to be called if one of the earlier fulctions returns an error? The sample C++ DSS code does not do so.

Thanks

Sergey_P_Intel2 · ‎11-10-2009

Dear Scott,

First of all, MKL manual specifies that by default input data and internal arrays in DSS / PARDISOhave double precision. For example:

[...] By default, the DSS routines use double precision for solving systems of linear equations. The precision used by the DSS routines can be set to single mode by specifying the following value:

MKL_DSS_SINGLE_PRECISION.

As for PARDISO, input data and internal arrays are required to have single precision.

The last sentence is describing the situation when DSS interface is used in single precision mode. Probably, this part of documentation needs to be re-written more carefully.

Please check the correspondancebetweeninput arrays precisions and DSS option MKL_DSS_SINGLE_PRECISION in your application. Namely MKL_DSS_SINGLE_PRECISION assumes that input arrays need to have single precision and vice versa.

Also, dss_delete() needs to be called in any case because itreleases internal PARDISO and DSS working arrays.

With best regards,
Sergey

Scott_Mcmichael · ‎11-11-2009

I have investigated the "dss_reorder" problem more thoroughly using the Pardiso calls instead of the DSS calls and have determined that the solver is sometimes able to solve the problem successfully but sometimes fails in the reordering step. The input matrix checker is turned on and does not report anything wrong with the input data. I have played around with several of the parameters including iparm[0]=0 and nothing has fixed the problem. I am using MKL version 10.2.2.025 on an Intel PC.

I have attached a zip file containing my C++ test code, the test data (736x736 symmetric matrix with 4664 non-zero values), and an example of the correct output data generated by the code. To run the compiled code put the .dat file into the same folder as the generated .exe file and run the executable.

Here is the output message on failure:

*** Error in PARDISO ( reordering_phase) error_num= -180
*** error PARDISO: reordering, symb. factorization

================ PARDISO: solving a symmetric indef. system ================

Summary PARDISO: ( reorder to reorder )
================

Times:
======
Time fulladj: 0.000256 s
Time reorder: 0.000788 s
Time symbfct: 0.000196 s
Time malloc : 0.020901 s
Time total : 0.022263 s total - sum: 0.000123 s

Statistics:
===========
< Parallel Direct Factorization with #processors: > 2
< Numerical Factorization with BLAS3 and O(n) synchronization >

< Linear system Ax = b>
#equations: 736
#non-zeros in A: 4664
non-zeros in A (): 0.861000

#right-hand sides: 1

< Factors L and U >
#columns for each panel: 80
#independent subgraphs: 0
< Preprocessing with state of the art partitioning metis>
#supernodes: 301
size of largest supernode: 58
number of nonzeros in L 16010
number of nonzeros in U 1
number of nonzeros in L+U 16011

ERROR during symbolic factorization: -3

============================================================

Can someone try this out and see if they get the same behavior?

Gennady_F_Intel · ‎11-11-2009

Scott,
I checked the problem with the same version of mkl ( Package ID: w_mkl_p_10.2.2.025 ) you are using on win32, static linking (mkl_intel_c.lib mkl_intel_thread.lib mkl_core.lib libiomp5md.lib). CPU == Intel Core2 Duo CPU T7300 @ 2.00GHz). The test you attached passed. the error == 0 for all pardiso's phases, including reodering.

--Gennady

Scott_Mcmichael · ‎11-12-2009

My linking parameters are the same as yours and the CPU looks similar. Did you test the program repeatedly and get a success every time? On my machine it seems to work about 50% of the time overall both in debug and release mode. Is there any additional diagnostic information I could collect on this error to help track down where the inconsistency is coming from?

A workaround I have found is to just re-run the phase 11 (re-order) pardiso call until something besides error code -3 is returned. This usually works within 2 or 3 attempts and obtains the correct answer.

Gennady_F_Intel · ‎11-12-2009

Quoting - Scott Mcmichael

My linking parameters are the same as yours and the CPU looks similar. Did you test the program repeatedly and get a success every time? On my machine it seems to work about 50% of the time overall both in debug and release mode. Is there any additional diagnostic information I could collect on this error to help track down where the inconsistency is coming from?

A workaround I have found is to just re-run the phase 11 (re-order) pardiso call until something besides error code -3 is returned. This usually works within 2 or 3 attempts and obtains the correct answer.

- yes, i ran the test you sent many times ( 9 ) under VS2005 and when i built this tesn by makefile. all cases were passed successfully. The output files were identical expected (xCorrect.txt ).
- independently - debug or release.
- no additional diagnostic (msglvl and matrix checker and error which have been already used in your code)