Software Archive
Read-only legacy content
공지
FPGA community forums and blogs on community.intel.com are migrating to the new Altera Community and are read-only. For urgent support needs during this transition, please visit the FPGA Design Resources page or contact an Altera Authorized Distributor.
17060 토론

Automatic Offload not working for dgetrf, dgetri

Vishal1
초급자
1,163 조회수

I'm having some trouble getting Automatic Offload to work with the MKL dgetrf & dgetri routines on our server with two Phi cards. dgemm routines in this code work just fine. Here's the build code -

icpc -c -fpic -shared  -std=c++11 -O3 -xHost -ip -ipo3 -parallel -funroll-loops -fno-alias -fno-fnalias -fargument-noalias -mkl -I include/ -I ~/Documents/Boost/boost_1_53_0/ src/PRH.cpp -o src/obj/PRH.o

Here's the OFFLOAD_REPORT generated when the code runs -

Reading in data...
Data read in.
Beginning PRH computation...
[MKL] [MIC --] [AO Function]    DGEMM
[MKL] [MIC --] [AO DGEMM Workdivision]    0.12 0.44 0.44
[MKL] [MIC 00] [AO DGEMM CPU Time]    23.427545 seconds
[MKL] [MIC 00] [AO DGEMM MIC Time]    19.126605 seconds
[MKL] [MIC 00] [AO DGEMM CPU->MIC Data]    7158788000 bytes
[MKL] [MIC 00] [AO DGEMM MIC->CPU Data]    22186080000 bytes
[MKL] [MIC 01] [AO DGEMM CPU Time]    23.427545 seconds
[MKL] [MIC 01] [AO DGEMM MIC Time]    19.060497 seconds
[MKL] [MIC 01] [AO DGEMM CPU->MIC Data]    7158788000 bytes
[MKL] [MIC 01] [AO DGEMM MIC->CPU Data]    22186080000 bytes
LnProb = -469708
Current Runtime (s): 2305.02
PRH computation finished.
Average Runtime (s): 2305.02

real    2m58.020s
user    38m16.392s
sys    0m12.115s

Why aren't the dgetrf and dgetri calls being offloaded?

0 포인트
7 응답
TimP
명예로운 기여자 III
1,163 조회수

?getr? are documented in http://software.intel.com/sites/default/files/11MIC42_How_to_Use_MKL_Automatic_Offload_0.pdf

as not being subject to automatic offload.  dtrsm looks the most likely of the functions called by dgetf to gain performance by running on MIC.  Did you test with explicit offload and find a gain?  MIC is supported primarily on 16 and 24 core Xeon servers which have pretty good MKL performance over a wider range of problems.

0 포인트
Vishal1
초급자
1,163 조회수

Hi Tim,

       The top of Pg 2 in the link you sent me says

"In the current MKL release (11.0), the following Level-3 BLAS functions and LAPACK functions are AO-enabled: ?GEMM, ?SYMM, ?TRMM, and ?TRSM &LU, QR, Cholesky factorizations "

          I'm using dgetrf - that's just LU decomposition isn't it (sorry, new to LAPACK)? Doesn't the document suggest that it's one of the functions that should be automatically offloaded?

0 포인트
Vishal1
초급자
1,163 조회수

I seem to have run into another problem as well.

The same code written using dsymm seg faults when the version with dgemm runs fine. Is this a known issue? The matrix size is 24,000 X 24,000 and I'm offloading to 2 Phis. Disabling the MICs (using mkl_mic_disable() ) just before the offending dsymm call in the code lets the code run fine.

0 포인트
Vishal1
초급자
1,163 조회수

Any idea why the dsygemm gives the segmentation fault on large-ish matrices when dgemm works fine for the same matrices? When I looked at the local terminal for the machine, it had manny lines of report that looked like...

micscif_rma_tc_can_cache 1540 total = 77319, current = 79624 reached max

Is there an offload bug with dsymm?

0 포인트
TimP
명예로운 기여자 III
1,163 조회수

You may have to watch memory, total stack, and thread stack consumption.  Your problem size seems large, particularly for the 8GB RAM coprocessors.  I guess automatic offload should split up the matrix according to your specification, so you may be able to find a split which doesn't over-run the coprocessors.  I've never run on a dual coprocessor system, so I'd say this is beyond my experieince.

0 포인트
Vishal1
초급자
1,163 조회수

At the momennt, I'm letting automatic offload (AO) handle the matrix splitting. It seems to do that just fine when doing DGEMM. It's the DSYMM call that is causing problems with the exact same matrix. Shouldn't DYSMM be offloading less (almost half as less) data as DGEMM for this matrix? Does this sound liuke a bug? Is there a bug reporting service thart I should submit this to?

0 포인트
TimP
명예로운 기여자 III
1,163 조회수

MKL issues may be submitted under the premier.intel.com account which is created automatically when you register the compiler.  If you didn't register the compiler, you can do so at https://registrationcenter.intel.com.

0 포인트
응답