- 신규로 표시
- 북마크
- 구독
- 소거
- RSS 피드 구독
- 강조
- 인쇄
- 부적절한 컨텐트 신고
Hi,
Ifound a problem with MKL_DCSRSYMV. When I ran the following code
Ifound a problem with MKL_DCSRSYMV. When I ran the following code
[cpp] PROGRAM TEST
INTEGER, PARAMETER :: N = 1000000
DOUBLE PRECISION X(N)
DO WHILE (.TRUE.)
CALL MKL_DCSRSYMV('U', N, SPREAD(1.D0, 1, N), (/1 : N + 1/),&
(/1 : N/), SPREAD(1.D0, 1, N), X)
END DO
END PROGRAM
[/cpp]
compiled with "ifort test.f90 -otest -mkl=parallel" (ifort pro 11.1.038 with the included mkl), memory consumption, as seen in'top',would keep rising until it drained all physical memory and I killed the process. I tried it on a four-socket Opteron 8350 and a dual-socket Xeon 5530. Memory usage blew up on both machines. Any cure for this?
링크가 복사됨
9 응답
- 신규로 표시
- 북마크
- 구독
- 소거
- RSS 피드 구독
- 강조
- 인쇄
- 부적절한 컨텐트 신고
Quoting - styc
Hi,
Ifound a problem with MKL_DCSRSYMV. When I ran the following code
Ifound a problem with MKL_DCSRSYMV. When I ran the following code
[cpp] PROGRAM TESTcompiled with "ifort test.f90 -otest -mkl=parallel" (ifort pro 11.1.038 with the included mkl), memory consumption, as seen in'top',would keep rising until it drained all physical memory and I killed the process. I tried it on a four-socket Opteron 8350 and a dual-socket Xeon 5530. Memory usage blew up on both machines. Any cure for this?
INTEGER, PARAMETER :: N = 1000000
DOUBLE PRECISION X(N)
DO WHILE (.TRUE.)
CALL MKL_DCSRSYMV('U', N, SPREAD(1.D0, 1, N), (/1 : N + 1/),&
(/1 : N/), SPREAD(1.D0, 1, N), X)
END DO
END PROGRAM
[/cpp]
one comment on coding. The way you call the routine forces compiler to create 3 temporary arrays (compiler has to make a copy of these arguments before passing them), an obvious performance degradation. Eliminating them? --> don't pass non-contiguous arrays to routines that don't accept arrays by descriptor.
A.
- 신규로 표시
- 북마크
- 구독
- 소거
- RSS 피드 구독
- 강조
- 인쇄
- 부적절한 컨텐트 신고
Quoting - ArturGuzik
Hi,
one comment on coding. The way you call the routine forces compiler to create 3 temporary arrays (compiler has to make a copy of these arguments before passing them), an obvious performance degradation. Eliminating them? --> don't pass non-contiguous arrays to routines that don't accept arrays by descriptor.
A.
one comment on coding. The way you call the routine forces compiler to create 3 temporary arrays (compiler has to make a copy of these arguments before passing them), an obvious performance degradation. Eliminating them? --> don't pass non-contiguous arrays to routines that don't accept arrays by descriptor.
A.
You see, this is just a test program. For a test program's sake performance is nonessential. The real problem is within a few hundred iterations the program consumes more than 10 GB worth of memory, an apparent nightmare when I use the routine inaKrylov subspacesolver because I only have 12 GB of memory on my machine.
- 신규로 표시
- 북마크
- 구독
- 소거
- RSS 피드 구독
- 강조
- 인쇄
- 부적절한 컨텐트 신고
Quoting - styc
You see, this is just a test program. For a test program's sake performance is nonessential. The real problem is within a few hundred iterations the program consumes more than 10 GB worth of memory, an apparent nightmare when I use the routine inaKrylov subspacesolver because I only have 12 GB of memory on my machine.
I guess that you're on Linux. I have no time to test it there, but on my Winx64 it uses at max 90 MB (I waited until 2,500 iterations passed) and I don't see any leak.
Did you try to replace that spread commands?
A.
- 신규로 표시
- 북마크
- 구독
- 소거
- RSS 피드 구독
- 강조
- 인쇄
- 부적절한 컨텐트 신고
Quoting - ArturGuzik
I know. That was just a comment.
I guess that you're on Linux. I have no time to test it there, but on my Winx64 it uses at max 90 MB (I waited until 2,500 iterations passed) and I don't see any leak.
Did you try to replace that spread commands?
A.
I guess that you're on Linux. I have no time to test it there, but on my Winx64 it uses at max 90 MB (I waited until 2,500 iterations passed) and I don't see any leak.
Did you try to replace that spread commands?
A.
Actually I discovered the problem when calling MKL_DCSRSYMV from C. I also tried local and allocatable arrays. The same massive leaks.
- 신규로 표시
- 북마크
- 구독
- 소거
- RSS 피드 구독
- 강조
- 인쇄
- 부적절한 컨텐트 신고
Quoting - styc
Actually I discovered the problem when calling MKL_DCSRSYMV from C. I also tried local and allocatable arrays. The same massive leaks.
Hi Styc,
If use the sequential library, what is the result?
The command line is like
ifort test.f -o test-lmkl_intel_lp64 -Wl,--start-group -lmkl_sequential -lmkl_core -Wl,--end-group-lpthread
orwould you like to upgradethe compilerto latest Compiler version, which use latest MKL 10.2.1 version?
I just try Intel Compiler 11.1.046 on 4 core Xeon machine. The test program seems run fine. No memory leak.
<http://software.intel.com/en-us/articles/which-version-of-ipp--mkl--tbb-is-installed-with-intel-compiler-professional-edition/>
Best Regards,
Ying
or
ifort test.f -o test -lmkl_intel_lp64 -Wl,--start-group -lmkl_intel_thread -lmkl_core -Wl,--end-group -liomp5 -lpthread
- 신규로 표시
- 북마크
- 구독
- 소거
- RSS 피드 구독
- 강조
- 인쇄
- 부적절한 컨텐트 신고
Quoting - Ying Hu (Intel)
Hi Styc,
If use the sequential library, what is the result?
The command line is like
ifort test.f -o test-lmkl_intel_lp64 -Wl,--start-group -lmkl_sequential -lmkl_core -Wl,--end-group-lpthread
orwould you like to upgradethe compilerto latest Compiler version, which use latest MKL 10.2.1 version?
I just try Intel Compiler 11.1.046 on 4 core Xeon machine. The test program seems run fine. No memory leak.
<http://software.intel.com/en-us/articles/which-version-of-ipp--mkl--tbb-is-installed-with-intel-compiler-professional-edition/>
Best Regards,
Ying
or
ifort test.f -o test -lmkl_intel_lp64 -Wl,--start-group -lmkl_intel_thread -lmkl_core -Wl,--end-group -liomp5 -lpthread
Well, I'm pretty reluctant to do the upgrade now because1) it's tricky 2) I don't have the time.
I tried several possible solutions I could think of and found four ways to makethe test programwork normally:
1) linking with -mkl=sequential
2) OMP_NUM_THREADS=1/2/3/4/5 (see, you need more than four threads to see it break)
3) MKL_DISABLE_FAST_MM=1
4) setting N <= 5242880 / OMP_NUM_THREADS
It seems that 1), 2) and 4) actually address the same problem-limiting the amount of memory MKL_DCSRSYMV requires, i.e. sizeof(double) * N * OMP_NUM_THREADS,tono more than40 MB. Apparently MKL_DCSRSYMV will repeatedly allocate workspaces larger than 40 MB butwon'tbother to deallocate them unless explicitly instructed to by something like 3).
- 신규로 표시
- 북마크
- 구독
- 소거
- RSS 피드 구독
- 강조
- 인쇄
- 부적절한 컨텐트 신고
Quoting - styc
Well, I'm pretty reluctant to do the upgrade now because1) it's tricky 2) I don't have the time.
I tried several possible solutions I could think of and found four ways to makethe test programwork normally:
1) linking with -mkl=sequential
2) OMP_NUM_THREADS=1/2/3/4/5 (see, you need more than four threads to see it break)
3) MKL_DISABLE_FAST_MM=1
4) setting N <= 5242880 / OMP_NUM_THREADS
It seems that 1), 2) and 4) actually address the same problem-limiting the amount of memory MKL_DCSRSYMV requires, i.e. sizeof(double) * N * OMP_NUM_THREADS,tono more than40 MB. Apparently MKL_DCSRSYMV will repeatedly allocate workspaces larger than 40 MB butwon'tbother to deallocate them unless explicitly instructed to by something like 3).
Hi Styc,
Good news, I'm able to reproduce the problem with MKL 10.2 and MKL 10.2.1. The problem happenedonly when the size of allocated arrays is huge. (if problem size is small, for example,N=1000, no such problem, right?).
The root cause is the defect in MKL memory manager. I have escaled toMKL engineer team to fix it.
At present, the best solution to avoid this problem is to set MKL_DISABLE_FAST_MM=1 asyou described.
What is your general problem size, N = 1000000?
Best Regards,
Ying
- 신규로 표시
- 북마크
- 구독
- 소거
- RSS 피드 구독
- 강조
- 인쇄
- 부적절한 컨텐트 신고
Quoting - Ying Hu (Intel)
Hi Styc,
Good news, I'm able to reproduce the problem with MKL 10.2 and MKL 10.2.1. The problem happenedonly when the size of allocated arrays is huge. (if problem size is small, for example,N=1000, no such problem, right?).
The root cause is the defect in MKL memory manager. I have escaled toMKL engineer team to fix it.
At present, the best solution to avoid this problem is to set MKL_DISABLE_FAST_MM=1 asyou described.
What is your general problem size, N = 1000000?
Best Regards,
Ying
Yes, typically around one million.
- 신규로 표시
- 북마크
- 구독
- 소거
- RSS 피드 구독
- 강조
- 인쇄
- 부적절한 컨텐트 신고
Quoting - styc
Yes, typically around one million.
Hi Styc,
Thanks for letting me know.the reference number is DPD200084696, we will notify you whenthe fix version is release (may be around Oct.).
Thanks
Ying