- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I need to calculate the operation A*P*A' by mkl function mkl_sparse_d_syprd.
In the example below, A is the identity matrix and P is the matrix with all elements are one. In my code, for the small array with nstate=100 or 1000, the code can run well. However if the values of nstate=50000 or larger, the following error appears: "Exception thrown at 0x00007FFDFB353C57 (mkl_avx2.2.dll) in Console22.exe: 0xC0000005: Access violation reading location 0x0000025986EEA108."
Please help me solve this problem. Thank you very much!
! A=Identity matrix
!P=matrix with all elements are one
program test_spblas
use mkl_spblas
implicit none
integer, parameter :: nstate = 50000
double precision,allocatable, dimension (:,:):: P,APAT
integer, allocatable, dimension (:):: c_A,pB_A,pE_A
double precision, allocatable, dimension (:):: v_A
integer stat,nnz_A,i
type(sparse_matrix_t) :: A_s
nnz_A=nstate
allocate(v_A(nnz_A),c_A(nnz_A),pB_A(nstate),pE_A(nstate))
allocate(P(nstate,nstate),APAT(nstate,nstate))
do i=1,nstate
pB_A(i)=i
pE_A(i)=i+1
c_A(i)=i
enddo
v_A=1d0
P=1d0
APAT=0d0
stat = mkl_sparse_d_create_csr(a_s,sparse_index_base_one,nstate,nstate,pb_a,pe_a,c_a,v_a)
stat = mkl_sparse_d_syprd (sparse_operation_non_transpose,a_s, p, sparse_layout_column_major, nstate, 1d0, 0d0, apat,spaRSE_LAYOUT_COLUMN_MAJOR, nstate)
end program test_spblas
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thanks for reaching out to us.
Could you please try running the code from Intel oneAPI command prompt and see if it is working there?
Please do let us know the MKL version with which you are working.
Regards,
Vidya.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This is my mkl version:
"Intel(R) oneAPI Math Kernel Library Version 2022.1-Product Build 20220311 for Intel(R) 64 architecture applications"
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This is what I got from oneAPI command prompt :
ifort test_spblas.f90 /Qiopenmp /Qopenmp-targets:spir64 /module:"D:\Fortran\oneAPI\mkl\2021.2.0\include\intel64\ilp64" /DMKL_ILP64 /4I8 -I"D:\Fortran\oneAPI\mkl\2021.2.0\include" /MD /fpp
ifort: command line warning #10148: option '/Qiopenmp' not supported
ifort: command line warning #10148: option '/Qopenmp-targets:spir64' not supported
test.f90(30): error #6633: The type of the actual argument differs from the type of the dummy argument. [PB_A]
stat = mkl_sparse_d_create_csr(a_s,sparse_index_base_one,nstate,nstate,pb_a,pe_a,c_a,v_a)
---------------------------------------------------------------------------^
test.f90(30): error #6633: The type of the actual argument differs from the type of the dummy argument. [PE_A]
stat = mkl_sparse_d_create_csr(a_s,sparse_index_base_one,nstate,nstate,pb_a,pe_a,c_a,v_a)
--------------------------------------------------------------------------------^
test.f90(30): error #6633: The type of the actual argument differs from the type of the dummy argument. [C_A]
stat = mkl_sparse_d_create_csr(a_s,sparse_index_base_one,nstate,nstate,pb_a,pe_a,c_a,v_a)
-------------------------------------------------------------------------------------^
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This is my last try. I used this command:
ifort /DMKL_DIRECT_CALL /fpp test.f90 mkl_intel_lp64.lib mkl_core.lib mkl_intel_thread.lib /Qopenmp -I"D:\Fortran\oneAPI\mkl\2021.2.0\include"/include
For nstate=10000. It's OK.
But for nstate=50000. It's said:
forrtl: severe (157): Program Exception - access violation
Image PC Routine Line Source
test.exe 00007FF7361CC128 Unknown Unknown Unknown
test.exe 00007FF736169AE2 Unknown Unknown Unknown
libiomp5md.dll 00007FFAE04B65D3 Unknown Unknown Unknown
libiomp5md.dll 00007FFAE0409877 Unknown Unknown Unknown
libiomp5md.dll 00007FFAE040B54C Unknown Unknown Unknown
libiomp5md.dll 00007FFAE03C4CE1 Unknown Unknown Unknown
test.exe 00007FF736169477 Unknown Unknown Unknown
test.exe 00007FF736168012 Unknown Unknown Unknown
test.exe 00007FF736151818 Unknown Unknown Unknown
test.exe 00007FF7361D21BE Unknown Unknown Unknown
test.exe 00007FF7361D2584 Unknown Unknown Unknown
KERNEL32.DLL 00007FFB50CB7034 Unknown Unknown Unknown
ntdll.dll 00007FFB51C62651 Unknown Unknown Unknown
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi nvh10,
It looks like there are a few things going on here, but the main one that is causing the overflow for nstate=50000 and not for nstate=10000 is the use of lp64. This means that pointers are 64 bit addresses, but integers are only 32 bit integers. Additionally, the internal implementation is actually in C language, so any 2D fortran pointers are actually collapsed and treated like 1D C arrays (you can see this in the module file with regards to the pointer DIMENSION(*)) . This normally works out, but may be to our disadvantage here in this case when dealing with integers and offsets...
It turns out that the range of int32 is [-2147483648, 2147483647] and the smallest int32 N such that N*N is in this range is N=46340. So for 50000, if we are doing something like C[ row * ldc + col] and row, ldc and col are 32 bit integers, then it is possible (likely) they could overflow and end up negative, then be upcast (still negative) in some way in the address offset computation which results in a seg fault. There are things we can do internally to make sure these addresses are computed using 64 bit integers which are compatible with the 64 bit addresses, and we will do this more carefully in the product, but otherwise, we need to be careful about this.
We are still looking into some other aspects of the ilp64 solution where it appears that the ldb and ldc are sometimes incorrect when they get to our internal kernels. Will update on that once more is understood.
Hope this helps a bit so far
Spencer
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Ok, here is the rest of the details. It turns out that there was an additional issue in the mkl_spblas.f90 module file for mkl_sparse_x_syprd which prevented the ilp64 version from working properly. We use ISO_C_BINDING 's to map this from the Fortran API you are calling to a C function implemented internally. In the case of mkl_sparse_x_syprd, there are two input arguments: ldb and ldc which where being incorrectly mapped.
If you change
INTEGER(C_INT) , INTENT(IN) :: ldb
and
INTEGER(C_INT) , INTENT(IN) :: ldc
to
INTEGER, INTENT(IN) :: ldb
and
INTEGER, INTENT(IN) :: ldc
then everything will work as desired. The C_INT kind always maps to a 4 byte integer, but ldb and ldc should be a 4 or 8 byte integer depending on use of the compiler option ( -i8 on linux/mac ) or ( /4I8 on Windows) to make integer 8 bytes in Fortran.
These are changes you can make yourself to the module file if this is necessary for another project immediately and will be fixed in the next oneMKL release (likely oneMKL 2023.1). Thank you for sharing this issue, so we could fix it
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thanks for sharing the details.
The issue is reproducible from our end as well.
We are working on this issue, we will get back to you soon.
Regards,
Vidya.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you very much for helping me!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you again to raise the issue. The fixed will be available in oneMKL next release.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page