해결됨: issue using FEAST with high-dimensional manifolds and OMP

Gagan · ‎10-08-2013

whats good guys.

problem: i am giving an N x N sparse matrix as input to FEAST and I am trying to solve for d-dimensional manifolds.

Note that this dimensionality, d, is independent of N.

Now when d isn't too large; say <3000, FEAST works completely fine.

However, if I "up" the value of d to say, 9000, i get errors such as:

Intel MKL Extended Eigensolvers: Size subspace 9001

#Loop | #Eig | Trace | Error-Trace | Max-Residual

OMP: Error #34: System unable to allocate necessary resources for OMP thread:

OMP: System error #35: Resource temporarily unavailable

OMP: Hint: Try decreasing the value of OMP_NUM_THREADS.

Abort trap: 6

I have tried setting export OMP_STACKSIZE=1024m, and I have also tried using the -stack_size argument in clang++ to specify a stacksize that's pretty large. Neither of these solutions worked. I have also looked at how much memory is being consumed when I run FEAST as above with either settings and the limit was roughly the same. It suggests to me that I'm not setting OMP_stacksize right, or maybe OSX does this differently?

Using clang++ with c++11 threading, and also the MKL for all the math. any assistance on this situation would be great.

Ying_H_Intel · ‎10-11-2013

Hi Gagan,

Thanks for your explanation. Right, after several hours later, i got same OMP error as your reported, so your code haven't problem. There is a bug the function. I send your a private message for the fix.

Best Regards,
Ying

원본 게시물의 솔루션 보기

Ying_H_Intel · ‎10-08-2013

Hi Gagan,

Could it because too many nested threading in your application? Could you please try to export OMP_NUM_THREADS=1 [/2/4/*] and export KMP_AFFINITY=verbose and see how many openMP threads was created?

Best Regards,
Ying

Gagan · ‎10-08-2013

hi Ying,

i've tried both of your suggestions, however neither solved the problem.

when I set the KMP_AFFINITY=verbose, it didn't seem to print additional output.

i'll look in the mkl libraries to see if there is a way to set this flag internally, but setting OMP_NUM_THREADS did not solve the issue.

hope this is helpful

Ying_H_Intel · ‎10-08-2013

Hi Gagan,

Do you have some details about your hardware and OS, including how do you link MKL?

Or a reproduce test case may be helpful.

Best Regards,

Ying

Gagan · ‎10-09-2013

hey man,

sure i'll dump the matrix and row/column indices to three separate files and write a testcase.

hopefully i'll get around to it later today and have it done by the night.

thanks again for your prompt assistance,

gagan

Gagan · ‎10-09-2013

thought there might be an issue with attachment size on a forum post, but_of course_ intel's forum is equipped for larger attachments :P

haha, awesome... BUT I DIGRESS.....

attached is a testcase. i compiled it on a Mac Pro using clang (cuz icpc isn't ready for 10.9 yet) with:

clang++-g -stdlib=libc++ -std=c++11 -O3 -fimf-arch-consistency=true -vec-guard-write -no-ftz -opt-mem-layout-trans=3 -ansi-alias -fPIC -funroll-all-loops -ipo -mtune=native -o testCase test.c -L/usr/lib -L/usr/local/lib -L/opt/intel/composerxe/mkl/lib -lmkl_intel_lp64 -lmkl_core -lpthread -lznz -lm -liomp5 -lmkl_intel_thread -lz

to execute:

success:./testcase 900 (works up to 2000ish).

failure: ./testCase 9000

this issue arises for any data of the form given above; i am hopeful this will help you guys.

I am using FEAST to provide a quantum mechanical solution to a deterministic theory involving geometry; in *theory* a large d should improve the quality of the manifolds returned by FEAST.

NOTE: i am not expecting software like this to be perfect by _any_ means, it is very new. i am just explaining what is going on and why i am trying to abuse your fantastic solver for more dimensions.

o, and...

much love for the premium membership upgrade thing (i love u intel, i want custom hw soon so be ready!!), i hope this data/testcase is fruitful for future releases :)

Ying_H_Intel · ‎10-10-2013

Hi Gagan,

Thanks. I can run the code, but the code seem hang when enter 9000. We will investigate it and keey you update.

Best Regards,
Ying

Ying_H_Intel · ‎10-11-2013

Hi Gagan,

We found the issue may be same as the one in http://software.intel.com/en-us/forums/topic/472477. but for Mac OS. Could you please let us know if it is urgent for you to get fix or is it ok for wait for next release (in 2-3 month)?

Best Regards,

Ying

Gagan · ‎10-11-2013

hi,

the higher dimensions would be very helpful. it seems this sort of issue is on dr polizzi's end then?

while it's not *urgent* by my definition, could you please see if he has planned a fix any time soon? 2-3 months is a very long time and i was hoping more that it wouldn't take more than a month or so?

i understand that you guys usually roll out big updates, so if you could provide a hotfix/workaround whenever you solve this issue, that would be good. i will just implement those fixes into the mkl code manually to tide me over until an official release is given.

Gagan · ‎10-11-2013

btw i'm running the dense version now and it's at 47.5gb and hasn't given the OMP issue.

when you referred to hanging running the test-case above, what did you mean? did you mean you got the OMP issue and it crashed? The reason I ask is because this solver should take a while for d=9000. maybe if you weren't getting the OMP_issue in sparse mode I was doing something wrong.

i will let you know if the dense version completes or errors out; right now it seems like it is actually working (it should take a while to find 9000 dimensions heh). will keep you posted.

Ying_H_Intel · ‎10-11-2013

Hi Gagan,

Thanks for your explanation. Right, after several hours later, i got same OMP error as your reported, so your code haven't problem. There is a bug the function. I send your a private message for the fix.

Best Regards,
Ying

Gagan · ‎10-13-2013

hey ying,

GagansMacPro-2:tester Gagan$ ./testCase 9000

Intel MKL Extended Eigensolvers: double precision driver

Intel MKL Extended Eigensolvers: List of input parameters fpm(1:64)-- if different from default

Intel MKL Extended Eigensolvers: fpm(1)=1

Intel MKL Extended Eigensolvers: fpm(2)=12

Intel MKL Extended Eigensolvers: fpm(4)=100

Intel MKL Extended Eigensolvers: fpm(5)=1

Intel MKL Extended Eigensolvers: fpm(6)=1

Search interval [0.000000000000000e+00;1.000000000000000e+35]

Intel MKL Extended Eigensolvers: Size subspace 9001

#Loop | #Eig | Trace | Error-Trace | Max-Residual

0,9000,4.500986238458544e+05,1.000000000000000e+00,2.159098422692203e-34

WOOP WOOP

thanks homie!!

Ying_H_Intel · ‎10-13-2013

It is great to know it works!

Thanks

Ying

Gennady_F_Intel · ‎10-29-2013

please check the official fix of the problem with the latest update 1 ( MKL v.11.1 Update 1) released the last Friday and let us know the results.

Gagan · ‎10-31-2013

what's good.

Writing sparse matrix to files...III. Computing the d-dimensional manifolds (Eigensolver)...

Intel MKL Extended Eigensolvers: double precision driver

Intel MKL Extended Eigensolvers: List of input parameters fpm(1:64)-- if different from default

Intel MKL Extended Eigensolvers: fpm(1)=1

Intel MKL Extended Eigensolvers: fpm(2)=12

Intel MKL Extended Eigensolvers: fpm(4)=100

Intel MKL Extended Eigensolvers: fpm(5)=1

Intel MKL Extended Eigensolvers: fpm(6)=1

Search interval [0.000000000000000e+00;1.000000000000000e+15]

Intel MKL Extended Eigensolvers: Size subspace 10001

#Loop | #Eig | Trace | Error-Trace | Max-Residual

0,10001,2.460420846718340e+06,1.000000000000000e+00,1.026947599855742e-13

rest assured, clang++ knows how icpc's ass tastes: https://twitter.com/i3roly/status/395951193559547904/photo/1 ;)

Gagan · ‎11-04-2013

yo guys,

running into another issue involving large callocs.

i've attached the updated matrices that replace the ones i attached above.

if you run this program as ./testCase 34830 the program segfaults at the allocation of the variable "output".

my question is, why? both calloc and malloc don't work. do i need a larger stacksize? this is for sure an issue with allocating a very *large* chunk of memory at once, but it is *imperative* that this occurs.

for calloc i get a segmentation fault: 11, for malloc i get:

reducefMRI(39024,0x7fff79900310) malloc: *** mach_vm_map(size=18446744056558313472) failed (error code=3)

*** error: can't allocate region

*** set a breakpoint in malloc_error_break to debug

any suggestions? i tried changing the compile from lp64 to ilp64 but it doesn't seem to make any difference.

to be precise, the array i'm trying to allocate has 76123*34830 elements so i don't think it's the ilp64/lp64.

thx

Ying_H_Intel · ‎11-06-2013

Hi Gagan,

Our developer help to investigate your question, please see his reply:

On first sight, test.c produces overflow on line 45 for lp64 interface:

“output = (double*) calloc(nrRows*(d+1),sizeof(double));”

In your example nrRows*(d+1) == 2651364090 > 2147483647 (2^31-1) == maximum of int for lp64. Output array takes approximately 21Gb of RAM.

To avoid overflow I used size variable :

“long int size=0;

…

size = d+1;

size= size * nrRows;

output = (double*) calloc(size,sizeof(double));”

But I got -1 error in EE solver with lp64 interface.

So I recommend to use ilp64 interface and to replace all int -> MKL_INT in test.c. ( or maybe compiler option -i8)

But I haven’t enough memory for ilp64 interface on machine with 33+ Gb of RAM.

Could you tell how many available RAM on your machine?

Best Regards,

Ying

Gagan · ‎11-07-2013

hey man,

in the midst of trying these suggestions out and i realized that dlange is bugging out when I use MKL_INT.

furthermore, LAPACKE_dlange doesn't exist in the MKL library for some reason. i would really like to use this function as it's crucial, and i realized now that what is stopping me from using ilp64 is this function not working. can we get this fixed??

thx

Gagan · ‎11-07-2013

update, i tried both dlansy and dlange and they do not work (calling the fortran interface via ilp64), they will cause segfaults.

additionally, as stated above, LAPACKE_dlange is missing from the cblas interface (as is dlansy, and i suspect a few others as well). if you guys could patch and update these functions and toss over the patched dist, that'd be great.

thx

Gagan · ‎11-07-2013

hi, another update.

there is an issue with csrcoo in ilp64 mode. i have attached the test.c that you can use with the matrix files above to reproduce this error. note this only happens in ilp64 mode. i haven't encountered this error otherwise and have spent the day trying to fix it. here is the output just to give you an idea of what the problem is:

Intel MKL ERROR: Parameter 1 was incorrect on entry to MKL_DCSRCOO.

Intel MKL ERROR: Parameter 1 was incorrect on entry to MKL_DCSRCOO.

note that in this testcase i'm trying to convert the CSR to coordinate, whereas in my actual program i'm converting coordinate to csr. thus the issue is independent of whether i'm going from CSR->COO or COO->CSR. i am hoping this is just a minor bug high up in the function.

Gennady_F_Intel · ‎11-08-2013

these symtomps indicate that you forget compile this example with ILP64 libraries w/o /DMKL_ILP64 option.