- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi
I have two questions about MKL-LAPACK
(1) What is the best (multithreaded) way to invert an arbitrary or a real symmetric matrix
I have tried using dgetrf and dgetri but they are considerably slower than Matlab. I am using dgetrf and dgetri as I have to use the factorization produced by dgetrf multiple times to solve linear systems of equations by providing a different right hand side in each iteration in an inverse modeling problem. I can also use dgetrs after dgetrf for solving my problem but speed wise it does not improve things.
(2) I know MATLAB internally uses MKL so which routine does it use to get much better (I know it is very difficult to answer this) performance than directly using MKL. In my application I have to invert a matrix with dimension 45000X45000 which is very slow and only uses a single thread in MKL in comparison to MATLAB.
A SAMPLE FORTRAN CODE THAT SHOWS MY IMPLEMENTATION OF DGETRF AND DGETRI IS ATTACHED. IN THE ATTACHED CODE I CREATE A POSITIVE DEFINITE MATRIX FOR INVERSION .HOWEVER IN THE REAL APPLICATION THE MATRIX IS NOT POSITIVE DEFINITE
Other details: It is a 64 bit application
Platform: Intel(R) Core(TM) i7-2600 CPU 3.40GHz for the test code/16GB of RAM
************************Application Specific Platform***************************************
vendor_id : GenuineIntel
RAM: 96 GB, cpu family : 6, model : 44, model name : Intel(R) Xeon(R) CPU X5660 @ 2.80GHz
stepping : 2, cpu MHz : 2801.000, cache size : 12288 KB, cpu cores : 6, apicid : 0, fpu : yes
fpu_exception : yes, cpuid level : 11, wp : yes
bogomips : 5600.36, clflush size : 64, cache_alignment : 64, address sizes : 40 bits physical, 48 bits virtual
********************************************************************************************************
MKL: Parallel version
MKL libraries included: mkl_intel_lp64.lib mkl_intel_thread.lib mkl_core.lib libiomp5md.lib mkl_lapack95_lp64.lib
Many thanks
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
getrf in MKL is threaded, but getri is not. For a full list of threaded LAPACK routiens in MKL, see here.
In addition to multithreading, data alignment is also very important to get good performance. You didn't do this in your sample code. On i7-2600 CPU, you should align your input data to 32 byte boundaries to get good performance.
These are the MKL libraries you need to link with: mkl_intel_lp64.lib mkl_intel_thread.lib mkl_core.lib. You need to explicitly link libiomp5md.lib only if you are not using Intel Fortran compiler. If you use Intel Fortran compiler, please pass /Qopenmp
Can you please provide more information?
- What Fortran compiler do you use?
- What is your MKL version?
- Did you set OMP_NUM_THREADS or MKL_NUM_THREADS?
- How big is the difference between MKL and Matlab run times?
Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Matlab code is simply: a=rand(5000); tic; b=a\speye(length(a)); toc which takes ~ 12 seconds whereas dgetrf+dgetri takes = 36.88
I am using Intel Parallel Studio 2013 on a Windows Machine (Intel(R) Core(TM) i7-2600 CPU) with Visual Studio as a development environment and a cluster with intel xeon processors (as mentioned above) with composer_xe_2011 for running codes. I did not set OMP_NUM_THREADS as dgemm in the same code uses all the cores but if it is required than I can set it.
On another note I can use dsytrf and dsytrs for my work (real symmetric matrices) and as per documentation both are threaded routines. However the error by using these routines in comparison to dgetrt/dgetri is extremely large that I avoid using it. I will upload the code to demonstrate these errors, in this case it might be possible that I am not specifying some parameters correctly but the answer from info shows everything as correct
Many Thanks
Vineet
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Vineet,
A big problem in your code was that the work array was not allocated with proper size. After workspace query, you need to allocate the work array with the size returned by the query. After fixing this problem, I saw the performance improves almost 5x on an i7-2600. Please see it attached. I compiled it with: ifort /align:array32byte /Qopenmp /Qmkl source1.f90
Zhang
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks. I have assigned appropriate workspace and then its performance is better or equivalent to Matlab. The changes that I made in the source file are:
call dgetrf(a_rows,a_rows,a,a_rows,ipiv,info)
! workspace query
call dgetri(a_rows,a,a_rows,ipiv,work,-1,info)
nmax=work(1)
deallocate(work)
allocate(work(nmax))
! invert matrix a
call dgetri(a_rows,a,a_rows,ipiv,work,nmax,info)
Many thanks for you help
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Vineet,
Another thing that can improve performance is proper data alignment. You'd align the dynamically allocated arrays to the 32-byte boundaries. There are multiple ways to do it. But the simplest one is probably passing "/align:array32byte" on your ifort command line.
Note Matlad uses many MKL functions internally. It's no surprise that the performance is not very different from one to another for some problems.
Zhang
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi
Zhang
Does
statement like: !dec$ attributes align: 32 :: a , b where a and b are arrays help in aligning the data . I have seen some performance gain by using this statement with composer xe 2013 on Intel(R) Core(TM) i7-2600 CPU 3.40GHz for the test code/16GB of RAM machine
Can you let me know what kind of compiler switches can be used to align data with composer xe 2011
Vineet
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The !dec$ attribute align:32 directive helps to align static and COMMON data. I'm not sure if it also applies to dynamically allocated data. The /align:array32byte compiler switch seems to work for all cases. This switch is available in Intel Composer XE 2011.
You can take a look at this article for more about data alignment in FORTRAN: http://software.intel.com/en-us/articles/data-alignment-to-assist-vectorization

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page