Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.
7222 토론

issue using FEAST with high-dimensional manifolds and OMP

Gagan
초급자
8,000 조회수

whats good guys.

problem: i am giving an N x N sparse matrix as input to FEAST and I am trying to solve for d-dimensional manifolds.

Note that this dimensionality, d, is independent of N.

Now when d isn't too large; say <3000, FEAST works completely fine.

However, if I "up" the value of d to say, 9000, i get errors such as:

Intel MKL Extended Eigensolvers: Size subspace 9001

#Loop | #Eig  |    Trace     | Error-Trace |  Max-Residual

OMP: Error #34: System unable to allocate necessary resources for OMP thread:

OMP: System error #35: Resource temporarily unavailable

OMP: Hint: Try decreasing the value of OMP_NUM_THREADS.

Abort trap: 6

I have tried setting export OMP_STACKSIZE=1024m, and I have also tried using the -stack_size argument in clang++ to specify a stacksize that's pretty large. Neither of these solutions worked. I have also looked at how much memory is being consumed when I run FEAST as above with either settings and the limit was roughly the same. It suggests to me that I'm not setting OMP_stacksize right, or maybe OSX does this differently?

Using clang++ with c++11 threading, and also the MKL for all the math. any assistance on this situation would be great.

0 포인트
1 솔루션
Ying_H_Intel
중재자
7,981 조회수

Hi Gagan,

Thanks for your explanation.  Right, after several hours later, i got same OMP error as your reported, so your code haven't problem.  There is a bug the function. I send your a private message for the fix.   

Best Regards,
Ying

원본 게시물의 솔루션 보기

0 포인트
24 응답
Ying_H_Intel
중재자
6,547 조회수

Hi Gagan,

Could it because too many nested threading in your application?  Could you please try to export OMP_NUM_THREADS=1  [/2/4/*] and export KMP_AFFINITY=verbose and see how many openMP threads was created?

Best Regards,
Ying

0 포인트
Gagan
초급자
6,547 조회수

hi Ying,

i've tried both of your suggestions, however neither solved the problem.

when I set the KMP_AFFINITY=verbose, it didn't seem to print additional output. 

i'll look in the mkl libraries to see if there is a way to set this flag internally, but setting OMP_NUM_THREADS did not solve the issue.

hope this is helpful

0 포인트
Ying_H_Intel
중재자
6,547 조회수

Hi Gagan,

Do you have some details about your hardware and OS, including how do you link MKL?

Or a reproduce test case may be helpful.

Best Regards,

Ying

 

0 포인트
Gagan
초급자
6,547 조회수

hey man,

sure i'll dump the matrix and row/column indices to three separate files and write a testcase.

hopefully i'll get around to it later today and have it done by the night.

thanks again for your prompt assistance,

gagan

0 포인트
Gagan
초급자
6,547 조회수

thought there might be an issue with attachment size on a forum post, but_of course_ intel's forum is equipped for larger attachments :P

haha, awesome... BUT I DIGRESS.....

attached is a testcase. i compiled it on a Mac Pro using clang (cuz icpc isn't ready for 10.9 yet) with:

clang++-g -stdlib=libc++ -std=c++11  -O3  -fimf-arch-consistency=true -vec-guard-write -no-ftz -opt-mem-layout-trans=3 -ansi-alias -fPIC -funroll-all-loops -ipo -mtune=native -o testCase test.c -L/usr/lib -L/usr/local/lib -L/opt/intel/composerxe/mkl/lib -lmkl_intel_lp64 -lmkl_core -lpthread -lznz -lm -liomp5 -lmkl_intel_thread -lz 

to execute:

success:./testcase 900 (works up to 2000ish).

failure: ./testCase 9000 

this issue arises for any data of the form given above; i am hopeful this will help you guys.

I am using FEAST to provide a quantum mechanical solution to a deterministic theory involving geometry; in *theory* a large d should improve the quality of the manifolds returned by FEAST.

NOTE: i am not expecting software like this to be perfect by _any_ means, it is very new. i am just explaining what is going on and why i am trying to abuse your fantastic solver for more dimensions. 

o, and...

much love for the premium membership upgrade thing (i love u intel, i want custom hw soon so be ready!!), i hope this data/testcase is fruitful for future releases :) 

0 포인트
Ying_H_Intel
중재자
6,547 조회수

Hi Gagan,

Thanks. I can run the code, but the code seem hang when enter 9000.  We will investigate it and keey you update.

Best Regards,
Ying

0 포인트
Ying_H_Intel
중재자
6,547 조회수

Hi Gagan,

We found the issue may be same as the one in http://software.intel.com/en-us/forums/topic/472477.  but for Mac OS.   Could you please let us know if it is urgent for you to get fix or is it ok for wait for next release (in 2-3 month)?

Best Regards,

Ying

 

0 포인트
Gagan
초급자
6,547 조회수

hi,

the higher dimensions would be very helpful. it seems this sort of issue is on dr polizzi's end then?

while it's not *urgent* by my definition, could you please see if he has planned a fix any time soon? 2-3 months is a very long time and i was hoping more that it wouldn't take more than a month or so?

i understand that you guys usually roll out big updates, so if you could provide a hotfix/workaround whenever you solve this issue, that would be good. i will just implement those fixes into the mkl code manually to tide me over until an official release is given.

0 포인트
Gagan
초급자
6,547 조회수

btw i'm running the dense version now and it's at 47.5gb and hasn't given the OMP issue.

when you referred to hanging running the test-case above, what did you mean? did you mean you got the OMP issue and it crashed? The reason I ask is because this solver should take a while for d=9000. maybe if you weren't getting the OMP_issue in sparse mode I was doing something wrong.

i will let you know if the dense version completes or errors out; right now it seems like it is actually working (it should take a while to find 9000 dimensions heh). will keep you posted.

0 포인트
Ying_H_Intel
중재자
7,982 조회수

Hi Gagan,

Thanks for your explanation.  Right, after several hours later, i got same OMP error as your reported, so your code haven't problem.  There is a bug the function. I send your a private message for the fix.   

Best Regards,
Ying

0 포인트
Gagan
초급자
6,547 조회수

hey ying,

GagansMacPro-2:tester Gagan$ ./testCase 9000

Intel MKL Extended Eigensolvers: double precision driver

Intel MKL Extended Eigensolvers: List of input parameters fpm(1:64)-- if different from default

Intel MKL Extended Eigensolvers: fpm(1)=1

Intel MKL Extended Eigensolvers: fpm(2)=12

Intel MKL Extended Eigensolvers: fpm(4)=100

Intel MKL Extended Eigensolvers: fpm(5)=1

Intel MKL Extended Eigensolvers: fpm(6)=1

Search interval [0.000000000000000e+00;1.000000000000000e+35]

Intel MKL Extended Eigensolvers: Size subspace 9001

#Loop | #Eig  |    Trace     | Error-Trace |  Max-Residual

0,9000,4.500986238458544e+05,1.000000000000000e+00,2.159098422692203e-34

WOOP WOOP

thanks homie!!

0 포인트
Ying_H_Intel
중재자
6,547 조회수

It is great to know it works!

Thanks

Ying

0 포인트
Gennady_F_Intel
중재자
6,547 조회수

please check the official fix of the problem with the latest update 1 ( MKL v.11.1 Update 1) released the last Friday and let us know the results.

0 포인트
Gagan
초급자
6,547 조회수

what's good.

Writing sparse matrix to files...III. Computing the d-dimensional manifolds (Eigensolver)...

Intel MKL Extended Eigensolvers: double precision driver

Intel MKL Extended Eigensolvers: List of input parameters fpm(1:64)-- if different from default

Intel MKL Extended Eigensolvers: fpm(1)=1

Intel MKL Extended Eigensolvers: fpm(2)=12

Intel MKL Extended Eigensolvers: fpm(4)=100

Intel MKL Extended Eigensolvers: fpm(5)=1

Intel MKL Extended Eigensolvers: fpm(6)=1

Search interval [0.000000000000000e+00;1.000000000000000e+15]

Intel MKL Extended Eigensolvers: Size subspace 10001

#Loop | #Eig  |    Trace     | Error-Trace |  Max-Residual

0,10001,2.460420846718340e+06,1.000000000000000e+00,1.026947599855742e-13

rest assured, clang++ knows how icpc's ass tastes: https://twitter.com/i3roly/status/395951193559547904/photo/1 ;)

0 포인트
Gagan
초급자
6,547 조회수

yo guys,

running into another issue involving large callocs.

i've attached the updated matrices that replace the ones i attached above.

if you run this program as ./testCase 34830 the program segfaults at the allocation of the variable "output".

my question is, why? both calloc and malloc don't work. do i need a larger stacksize? this is for sure an issue with allocating a very *large* chunk of memory at once, but it is *imperative* that this occurs.

for calloc i get a segmentation fault: 11, for malloc i get:

reducefMRI(39024,0x7fff79900310) malloc: *** mach_vm_map(size=18446744056558313472) failed (error code=3)

*** error: can't allocate region

*** set a breakpoint in malloc_error_break to debug

any suggestions? i tried changing the compile from lp64 to ilp64 but it doesn't seem to make any difference. 

to be precise, the array i'm trying to allocate has 76123*34830 elements so i don't think it's the ilp64/lp64.

thx

0 포인트
Ying_H_Intel
중재자
6,547 조회수

Hi Gagan,

Our developer help to investigate your question, please see his reply:   

On first sight,  test.c produces overflow on line 45 for lp64 interface:

“output = (double*) calloc(nrRows*(d+1),sizeof(double));”

In your example nrRows*(d+1) == 2651364090 > 2147483647 (2^31-1) == maximum of  int for lp64.  Output array takes approximately 21Gb of RAM.

 To avoid overflow  I used size variable :  

“long int size=0;

size = d+1;

size= size * nrRows;

output = (double*) calloc(size,sizeof(double));”

But I got -1 error in EE solver with lp64 interface.

So I recommend to use ilp64 interface and to replace all int -> MKL_INT in test.c.    ( or maybe compiler option -i8)

But I haven’t enough memory for ilp64 interface on machine with 33+ Gb of RAM.

Could you tell how many  available RAM on your machine?

Best Regards,

Ying

0 포인트
Gagan
초급자
6,547 조회수

hey man, 

in the midst of trying these suggestions out and i realized that dlange is bugging out when I use MKL_INT.

furthermore, LAPACKE_dlange doesn't exist in the MKL library for some reason. i would really like to use this function as it's crucial, and i realized now that what is stopping me from using ilp64 is this function not working. can we get this fixed??

thx

0 포인트
Gagan
초급자
6,547 조회수

update, i tried both dlansy and dlange and they do not work (calling the fortran interface via ilp64), they will cause segfaults.

additionally, as stated above, LAPACKE_dlange is missing from the cblas interface (as is dlansy, and i suspect a few others as well). if you guys could patch and update these functions and toss over the patched dist, that'd be great.

thx

0 포인트
Gagan
초급자
6,546 조회수

hi, another update.

there is an issue with csrcoo in ilp64 mode. i have attached the test.c that you can use with the matrix files above to reproduce this error. note this only happens in ilp64 mode. i haven't encountered this error otherwise and have spent the day trying to fix it. here is the output just to give you an idea of what the problem is:

Intel MKL ERROR: Parameter 1 was incorrect on entry to MKL_DCSRCOO.

Intel MKL ERROR: Parameter 1 was incorrect on entry to MKL_DCSRCOO.

note that in this testcase i'm trying to convert the CSR to coordinate, whereas in my actual program i'm converting coordinate to csr. thus the issue is independent of whether i'm going from CSR->COO or COO->CSR. i am hoping this is just a minor bug high up in the function.

0 포인트
Gennady_F_Intel
중재자
6,381 조회수

these symtomps indicate that you forget compile this example with ILP64 libraries w/o /DMKL_ILP64 option.

0 포인트
응답