- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Here is a very small program that solves two linear equations using the MKL DSS interface to Pardiso. First, the test program:
program ptmkl use mkl_dss implicit none TYPE (MKL_DSS_HANDLE) :: handle INTEGER opt,dss_err INTEGER, PARAMETER :: NEQ=2, NNZM=NEQ*NEQ INTEGER :: rowIDX(NEQ+1) = [1,3,5] INTEGER :: COL(NNZM) = [1,2, 1,2] INTEGER :: i,j,k, n = NEQ, nnz = NNZM, perm(NEQ) DOUBLE PRECISION :: A(NNZM) = [1d0, -1d-2, -1d-2, 1d0] DOUBLE PRECISION :: B(NEQ) = [1d0, 2d0], X(NEQ) opt=MKL_DSS_DEFAULTS dss_err = dss_create(handle, opt) write(*,10)'Create ',dss_err dss_err = dss_define_structure(handle,opt,rowIDX,n,n,COL,nnz) write(*,10)'Define ',dss_err dss_err = dss_reorder(handle,opt,perm) write(*,10)'ReOrder',dss_err dss_err = dss_factor_real(handle,opt,A) write(*,10)'Factor ',dss_err dss_err = dss_solve_real(handle,opt,B,1,X) write(*,10)'Solve ',dss_err 10 format(A7,2x,I4) end program ptmkl
I compile this program with IFort 15.0 IA-32 using the command
ifort /Qmkl /traceback /MD dssbug.f90
When I then run the program repeatedly, it works correctly very often but, once in a while, aborts with a C0000005 or C0000374 error. To track the problem down, I ran the program inside Inspector XE 2015, and the screenshot is attached.
This is a shorter reproducer for the problems reported by another user, see https://software.intel.com/en-us/forums/topic/535430 .
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Additional note : I compiled the file mkl_dss.f90 (which is installed in the MKL Include directory) all by itself in order to produce the file mkl_dss.mod, which is used in the reproducer.
The problem is encountered with other versions of MKL and IFort, as well. In fact, I have a modified version that I can build with CVF6.6 and the bug is present in the old CXML library.
Update, Nov. 15: I updated my installation to Fortran Composer 15.0 update 1 last night, which updated the MKL version to 11.2.1 Product Build 20141023. The bug is present in this version, too, and here are more details (from a 32-bit run) to help you with a diagnosis and fix.
The access violation is always at the same location if it occurs at all. This location is in mkl_intel_thread.dll, routine mkl_pds_invs_perm_mod_pardiso() + 0EC2H. The instruction is mov ecx, dword ptr [ecx+edx*4-4], where ecx is set equal to the base of the permutation index array, which is the 9th argument present at the function entry. The memory to which ECX points contains just two entries, with values 1 and 2 (the test problem has n_eq = 2), followed by lots of '0BADFOOD'. All this is fine. However, EDX contains the index (1-base?) into the permutation index array, and when the crash happens it contains various values in different runs, but I have only seen values larger than 0600H. Such large values indicate an array bound error that is responsible for the access violation (remember, the iperm array has only two elements in the test problem, so at the exception point EDX should never have a value greater than 2).
Some of the statements in the last paragraph are speculative, since I have made the statements on the basis of inspecting the disassembly listing in the debugger. I apologize for any incorrect speculation.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This bug is still present in MKL 11.3. It is not evasive any more.
program dssbug use mkl_dss implicit none TYPE (MKL_DSS_HANDLE) :: handle CHARACTER(LEN=198) :: vers INTEGER opt,dss_err INTEGER, PARAMETER :: NEQ=2, NNZM=NEQ*NEQ INTEGER :: rowIDX(NEQ+1) = [1,3,5] INTEGER :: COL(NNZM) = [1,2, 1,2] INTEGER :: i,j,k, n = NEQ, nnz = NNZM, perm(NEQ) DOUBLE PRECISION :: A(NNZM) = [1d0, -1d-2, -1d-2, 1d0] DOUBLE PRECISION :: B(NEQ) = [1d0, 2d0], X(NEQ) call mkl_get_version_string(vers) write(*,*)trim(vers) opt=MKL_DSS_DEFAULTS dss_err = dss_create(handle, opt) write(*,10)'Create ',dss_err dss_err = dss_define_structure(handle,opt,rowIDX,n,n,COL,nnz) write(*,10)'Define ',dss_err dss_err = dss_reorder(handle,opt,perm) write(*,10)'ReOrder',dss_err dss_err = dss_factor_real(handle,opt,A) write(*,10)'Factor ',dss_err dss_err = dss_solve_real(handle,opt,B,1,X) write(*,10)'Solve ',dss_err 10 format(A7,2x,I4) end program dssbug
In 32-bits, the traceback fails to report the line number, and the access violation occurs always:
Intel(R) Math Kernel Library Version 11.3.0 Product Build 20150730 for 32-bit a pplications Create 0 Define 0 ReOrder 0 forrtl: severe (157): Program Exception - access violation Image PC Routine Line Source mkl_intel_thread. 611699C8 Unknown Unknown Unknown libiomp5md.dll 5C6927E5 Unknown Unknown Unknown libiomp5md.dll 5C65FAEC Unknown Unknown Unknown libiomp5md.dll 5C6313B8 Unknown Unknown Unknown mkl_intel_thread. 611666C2 Unknown Unknown Unknown mkl_core.dll 5D019C8C Unknown Unknown Unknown mkl_core.dll 5CF2E09C Unknown Unknown Unknown mkl_core.dll 5CF16F9A Unknown Unknown Unknown mkl_core.dll 5CECC509 Unknown Unknown Unknown mkl_core.dll 5CEA81EE Unknown Unknown Unknown mkl_core.dll 5CE676E8 Unknown Unknown Unknown mkl_core.dll 5CE4C428 Unknown Unknown Unknown mkl_core.dll 5CE4C01C Unknown Unknown Unknown ntdll.dll 777526BB Unknown Unknown Unknown
With 64-bits, the line number is given, but once in a while the traceback is not given but a WER action is triggered.
Intel(R) Math Kernel Library Version 11.3.0 Product Build 20150730 for Intel(R) 64 architecture applications Create 0 Define 0 forrtl: severe (157): Program Exception - access violation Image PC Routine Line Source mkl_intel_thread. 00007FFE9B3CBD97 Unknown Unknown Unknown mkl_core.dll 00007FFE99FB3FF7 Unknown Unknown Unknown mkl_core.dll 00007FFE99F94CCC Unknown Unknown Unknown mkl_core.dll 00007FFE99F1C6CD Unknown Unknown Unknown mkl_core.dll 00007FFE99EED525 Unknown Unknown Unknown dssbug.exe 00007FF74CC811F9 MAIN__ 21 dssbug.f90 dssbug.exe 00007FF74CC829CE Unknown Unknown Unknown dssbug.exe 00007FF74CC82D83 Unknown Unknown Unknown KERNEL32.DLL 00007FFEC0322D92 Unknown Unknown Unknown ntdll.dll 00007FFEC1FB9F64 Unknown Unknown Unknown
In both cases, I compiled with /traceback /MD /Qmkl. If the access violation is caused by an error in the arguments passed or by a wrong sequence of calls, one should like to know the specific error so that it can be avoided. In fact, I suspect that passing opt=MKL_DSS_DEFAULTS in all the DSS calls is probably not correct, and the documentation should make it clear if that is the case. If there is no linear equations case for which MKL_DSS_DEFAULTS is consistently correct, perhaps "DEFAULTS" is not a good choice as a label.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks mecej4. We missed this problem and I see the similar issue on my side too. Escalated. --Gennady
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for looking at this bug report (first made eleven months ago). I just tested the program of #3 on Linux, and I do not see the error, whether I use 32- or 64-bits with IFort 16.0. The final solution in X(:) has the correct values, as well.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Any updates on this issue? I'm having a similar issue with Pardiso (with either the DSS interface or the standard Pardiso interface) where I will get an access violation error when executing numerical factorization (calling dss_factor_real in the dss interface). What is odd is that the exact same code will run just fine on another machine and yield the expected result. A hint to where the problem comes from or a workaround would be very nice.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The bug is still present in Parallel Studio 16 Update 1 with MKL 11.3.1 (64-bit, Windows and Linux).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
yes, the fix will be available into the nearest update 2 of MKL 11.3
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
HI All,
Default dss parameter mean symmetric matrix for which only upper triangular need to be set. After changing full portrait of matrix on upper triangular in presented reproducer it passed correctly
Thanks,
Alex
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page