- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Ifound a problem with MKL_DCSRSYMV. When I ran the following code
Ifound a problem with MKL_DCSRSYMV. When I ran the following code
[cpp] PROGRAM TEST INTEGER, PARAMETER :: N = 1000000 DOUBLE PRECISION X(N) DO WHILE (.TRUE.) CALL MKL_DCSRSYMV('U', N, SPREAD(1.D0, 1, N), (/1 : N + 1/),& (/1 : N/), SPREAD(1.D0, 1, N), X) END DO END PROGRAM [/cpp]compiled with "ifort test.f90 -otest -mkl=parallel" (ifort pro 11.1.038 with the included mkl), memory consumption, as seen in'top',would keep rising until it drained all physical memory and I killed the process. I tried it on a four-socket Opteron 8350 and a dual-socket Xeon 5530. Memory usage blew up on both machines. Any cure for this?
Link Copied
9 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - styc
Hi,
Ifound a problem with MKL_DCSRSYMV. When I ran the following code
Ifound a problem with MKL_DCSRSYMV. When I ran the following code
[cpp] PROGRAM TESTcompiled with "ifort test.f90 -otest -mkl=parallel" (ifort pro 11.1.038 with the included mkl), memory consumption, as seen in'top',would keep rising until it drained all physical memory and I killed the process. I tried it on a four-socket Opteron 8350 and a dual-socket Xeon 5530. Memory usage blew up on both machines. Any cure for this?
INTEGER, PARAMETER :: N = 1000000
DOUBLE PRECISION X(N)
DO WHILE (.TRUE.)
CALL MKL_DCSRSYMV('U', N, SPREAD(1.D0, 1, N), (/1 : N + 1/),&
(/1 : N/), SPREAD(1.D0, 1, N), X)
END DO
END PROGRAM
[/cpp]
one comment on coding. The way you call the routine forces compiler to create 3 temporary arrays (compiler has to make a copy of these arguments before passing them), an obvious performance degradation. Eliminating them? --> don't pass non-contiguous arrays to routines that don't accept arrays by descriptor.
A.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - ArturGuzik
Hi,
one comment on coding. The way you call the routine forces compiler to create 3 temporary arrays (compiler has to make a copy of these arguments before passing them), an obvious performance degradation. Eliminating them? --> don't pass non-contiguous arrays to routines that don't accept arrays by descriptor.
A.
one comment on coding. The way you call the routine forces compiler to create 3 temporary arrays (compiler has to make a copy of these arguments before passing them), an obvious performance degradation. Eliminating them? --> don't pass non-contiguous arrays to routines that don't accept arrays by descriptor.
A.
You see, this is just a test program. For a test program's sake performance is nonessential. The real problem is within a few hundred iterations the program consumes more than 10 GB worth of memory, an apparent nightmare when I use the routine inaKrylov subspacesolver because I only have 12 GB of memory on my machine.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - styc
You see, this is just a test program. For a test program's sake performance is nonessential. The real problem is within a few hundred iterations the program consumes more than 10 GB worth of memory, an apparent nightmare when I use the routine inaKrylov subspacesolver because I only have 12 GB of memory on my machine.
I guess that you're on Linux. I have no time to test it there, but on my Winx64 it uses at max 90 MB (I waited until 2,500 iterations passed) and I don't see any leak.
Did you try to replace that spread commands?
A.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - ArturGuzik
I know. That was just a comment.
I guess that you're on Linux. I have no time to test it there, but on my Winx64 it uses at max 90 MB (I waited until 2,500 iterations passed) and I don't see any leak.
Did you try to replace that spread commands?
A.
I guess that you're on Linux. I have no time to test it there, but on my Winx64 it uses at max 90 MB (I waited until 2,500 iterations passed) and I don't see any leak.
Did you try to replace that spread commands?
A.
Actually I discovered the problem when calling MKL_DCSRSYMV from C. I also tried local and allocatable arrays. The same massive leaks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - styc
Actually I discovered the problem when calling MKL_DCSRSYMV from C. I also tried local and allocatable arrays. The same massive leaks.
Hi Styc,
If use the sequential library, what is the result?
The command line is like
ifort test.f -o test-lmkl_intel_lp64 -Wl,--start-group -lmkl_sequential -lmkl_core -Wl,--end-group-lpthread
orwould you like to upgradethe compilerto latest Compiler version, which use latest MKL 10.2.1 version?
I just try Intel Compiler 11.1.046 on 4 core Xeon machine. The test program seems run fine. No memory leak.
<http://software.intel.com/en-us/articles/which-version-of-ipp--mkl--tbb-is-installed-with-intel-compiler-professional-edition/>
Best Regards,
Ying
or
ifort test.f -o test -lmkl_intel_lp64 -Wl,--start-group -lmkl_intel_thread -lmkl_core -Wl,--end-group -liomp5 -lpthread
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - Ying Hu (Intel)
Hi Styc,
If use the sequential library, what is the result?
The command line is like
ifort test.f -o test-lmkl_intel_lp64 -Wl,--start-group -lmkl_sequential -lmkl_core -Wl,--end-group-lpthread
orwould you like to upgradethe compilerto latest Compiler version, which use latest MKL 10.2.1 version?
I just try Intel Compiler 11.1.046 on 4 core Xeon machine. The test program seems run fine. No memory leak.
<http://software.intel.com/en-us/articles/which-version-of-ipp--mkl--tbb-is-installed-with-intel-compiler-professional-edition/>
Best Regards,
Ying
or
ifort test.f -o test -lmkl_intel_lp64 -Wl,--start-group -lmkl_intel_thread -lmkl_core -Wl,--end-group -liomp5 -lpthread
Well, I'm pretty reluctant to do the upgrade now because1) it's tricky 2) I don't have the time.
I tried several possible solutions I could think of and found four ways to makethe test programwork normally:
1) linking with -mkl=sequential
2) OMP_NUM_THREADS=1/2/3/4/5 (see, you need more than four threads to see it break)
3) MKL_DISABLE_FAST_MM=1
4) setting N <= 5242880 / OMP_NUM_THREADS
It seems that 1), 2) and 4) actually address the same problem-limiting the amount of memory MKL_DCSRSYMV requires, i.e. sizeof(double) * N * OMP_NUM_THREADS,tono more than40 MB. Apparently MKL_DCSRSYMV will repeatedly allocate workspaces larger than 40 MB butwon'tbother to deallocate them unless explicitly instructed to by something like 3).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - styc
Well, I'm pretty reluctant to do the upgrade now because1) it's tricky 2) I don't have the time.
I tried several possible solutions I could think of and found four ways to makethe test programwork normally:
1) linking with -mkl=sequential
2) OMP_NUM_THREADS=1/2/3/4/5 (see, you need more than four threads to see it break)
3) MKL_DISABLE_FAST_MM=1
4) setting N <= 5242880 / OMP_NUM_THREADS
It seems that 1), 2) and 4) actually address the same problem-limiting the amount of memory MKL_DCSRSYMV requires, i.e. sizeof(double) * N * OMP_NUM_THREADS,tono more than40 MB. Apparently MKL_DCSRSYMV will repeatedly allocate workspaces larger than 40 MB butwon'tbother to deallocate them unless explicitly instructed to by something like 3).
Hi Styc,
Good news, I'm able to reproduce the problem with MKL 10.2 and MKL 10.2.1. The problem happenedonly when the size of allocated arrays is huge. (if problem size is small, for example,N=1000, no such problem, right?).
The root cause is the defect in MKL memory manager. I have escaled toMKL engineer team to fix it.
At present, the best solution to avoid this problem is to set MKL_DISABLE_FAST_MM=1 asyou described.
What is your general problem size, N = 1000000?
Best Regards,
Ying
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - Ying Hu (Intel)
Hi Styc,
Good news, I'm able to reproduce the problem with MKL 10.2 and MKL 10.2.1. The problem happenedonly when the size of allocated arrays is huge. (if problem size is small, for example,N=1000, no such problem, right?).
The root cause is the defect in MKL memory manager. I have escaled toMKL engineer team to fix it.
At present, the best solution to avoid this problem is to set MKL_DISABLE_FAST_MM=1 asyou described.
What is your general problem size, N = 1000000?
Best Regards,
Ying
Yes, typically around one million.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - styc
Yes, typically around one million.
Hi Styc,
Thanks for letting me know.the reference number is DPD200084696, we will notify you whenthe fix version is release (may be around Oct.).
Thanks
Ying

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page