- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The code below behaves differently when built with the /Qmkl:parallel and /Qmkl:cluster. In both cases the code is built for Win 7 64-bit, using the latest Intel compiler and libraries. It is launched as a mpi process with mpiexec.exe -n 2 (that is, using only two ranks) on a dual 6-core workstation.
When /Qmkl:parallel is used, the call to the MKL functions on rank 0 do take advantage of the 6 OpenMP threads there.
When /Qmkl:cluster is used, only one thread on rank 0 is being used and therefore it is six times slower.
Any idea on how to have threaded behavior of /Qmkl:cluster?
Also, why is LWORK double in the /Qmkl:parallel case??
PROGRAM MAIN USE OMP_LIB USE MPI IMPLICIT NONE INTEGER(KIND=4) :: N,ALLOC_ERROR,INFO,LWORK,SEED_SIZE,I,IERR INTEGER(KIND=8) :: CLOCK_START,CLOCK_STOP,CLOCK_RATE,CLOCK_MAX INTEGER(KIND=4),ALLOCATABLE :: SEEDS(:) LOGICAL :: MPI_IS_INITIALIZED REAL(KIND=8) :: W(1) REAL(KIND=8),ALLOCATABLE :: A(:,:),TAU(:),WORK(:) CALL MPI_INITIALIZED(MPI_IS_INITIALIZED,IERR) IF (.NOT.MPI_IS_INITIALIZED) THEN CALL MPI_INIT(IERR) END IF WRITE(*,*) 'I am image ',THIS_IMAGE(),' and I can span ',OMP_GET_MAX_THREADS(),' OpenMP threads.' IF (THIS_IMAGE()==1) THEN N = 3000 WRITE(*,*) 'N = ',N ALLOCATE(A(N,N),STAT=ALLOC_ERROR) IF (ALLOC_ERROR/=0) THEN ERROR STOP END IF CALL RANDOM_SEED(SIZE=SEED_SIZE) ALLOCATE(SEEDS(SEED_SIZE)) SEEDS=123456 CALL RANDOM_SEED(PUT=SEEDS) CALL RANDOM_NUMBER(A) ALLOCATE(TAU(N),STAT=ALLOC_ERROR) LWORK = -1 CALL DGEQRF(N,N,A,N,TAU,W,LWORK,INFO) WRITE(*,*) 'LWORK = ',W(1) LWORK = INT(W(1)) ALLOCATE(WORK(LWORK),STAT=ALLOC_ERROR) CALL SYSTEM_CLOCK(CLOCK_START,CLOCK_RATE,CLOCK_MAX) CALL DGEQRF(N,N,A,N,TAU,WORK,LWORK,INFO) CALL DORGQR(N,N,N,A,N,TAU,WORK,LWORK,INFO) CALL SYSTEM_CLOCK(CLOCK_STOP,CLOCK_RATE,CLOCK_MAX) WRITE(*,*) 'INFO = ',INFO WRITE(*,*) 'TIME = ',(REAL(CLOCK_STOP-CLOCK_START,KIND=8))/REAL(CLOCK_RATE,KIND=8) END IF END PROGRAM MAIN
Here is the output when using /Qmkl:cluster
I am image 2 and I can span 6 OpenMP threads. I am image 1 and I can span 6 OpenMP threads. N = 3000 LWORK = 288096 INFO = 0 TIME = 6.38000000000000 A(N,N) = -2.110006751937421E-002
Here is the output when using /Qmkl:parallel
I am image 2 and I can span 6 OpenMP threads. I am image 1 and I can span 6 OpenMP threads. N = 3000 LWORK = 742977 INFO = 0 TIME = 0.920000000000000 A(N,N) = -2.110006751937324E-002
Here is the build log (when using /Qmkl:cluster)
Compiling with Intel(R) Visual Fortran Compiler 17.0 [Intel(R) 64]... ifort /nologo /O2 /I"C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2017.0.048\windows\mpi\intel64\include" /Qopenmp /standard-semantics /module:"x64\Release\\" /object:"x64\Release\\" /Fd"x64\Release\vc120.pdb" /libs:dll /threads /Qmkl:cluster /c /Qcoarray:single /Qlocation,link,"C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\\bin\amd64" /Qm64 "D:\TEMP\QR_PERFORMANCE\MAIN.F90" Linking... Link /OUT:"x64\Release\QR_PERFORMANCE.exe" /INCREMENTAL:NO /NOLOGO /LIBPATH:"C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2017.0.048\windows\mpi\intel64\lib\release_mt" /MANIFEST /MANIFESTFILE:"x64\Release\QR_PERFORMANCE.exe.intermediate.manifest" /MANIFESTUAC:"level='asInvoker' uiAccess='false'" /SUBSYSTEM:CONSOLE /IMPLIB:"D:\TEMP\QR_PERFORMANCE\x64\Release\QR_PERFORMANCE.lib" impi.lib -qm64 /qoffload-ldopts="-mkl=cluster" "x64\Release\MAIN.obj" Embedding manifest... mt.exe /nologo /outputresource:"D:\TEMP\QR_PERFORMANCE\x64\Release\QR_PERFORMANCE.exe;#1" /manifest "x64\Release\QR_PERFORMANCE.exe.intermediate.manifest" QR_PERFORMANCE - 0 error(s), 0 warning(s)
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Edit: I found the proper threading libraries to solve the problem:
mkl_scalapack_lp64_dll.lib mkl_intel_lp64_dll.lib mkl_core_dll.lib mkl_intel_thread_dll.lib mkl_blacs_lp64_dll.lib impi.lib
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page