- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Intel Math Kernel Library (Intel MKL) is a highly optimized, extensively threaded, and thread-safe library of mathematical functions for engineering, scientific, and financial applications that require maximum performance. The Intel MKL 10.3 Update 4 packages are now ready for download. Intel MKL is available as a stand-alone product and as a part of the Intel Parallel Studio XE 2011, Intel C++ Studio XE 2011, Intel Composer XE 2011, Intel Fortran Composer XE 2011, Intel C++ Composer XE 2011, and Intel Cluster Studio 2011. Please visit the Intel Software Evaluation Center to evaluate this product.

*Check out the latest **Intel MKL benchmarks** using the new Intel AVX instruction set! *

**The new Intel MKL 10.3 Update 4 release provides the following:**

- BLAS: Improved DTRMM performance on Intel Xeon processors 5400 and later
- BLAS: Improved DTRSM performance on all 64-bit enabled processors, especially processors with Intel Advanced Vector Extensions (Intel AVX)
- LAPACK: Incorporated bug fixes from the LAPACK 3.3.1 release
- OOC PARDISO: Improved the estimate of the amount of memory needed in out-of-core operation
- FFT: Improved 1D real FFT scaling through improved threading
- FFT: Updated C and Fortran FFT examples to use the new single dynamic library linking model
- VML: Improved performance of the single precision Enhanced Performance version of the real Hypot and complex Abs functions and of the complex Arg, Div, Mul, MulByConj functions for all accuracy modes on Intel Xeon processors 5600 and 7500 series, and the Intel Core i7-2600 processor
- Service functions: Improvements and additions to the Intel MKL service functions (see the online release notes for more information)
- Bug fixes

- Read the release notes online for more information.

Link Copied

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

We have mkl version 10.0.1.014, can we switch to mkl 10.3 without any change to my system architecture and compiler?

Thnanks,

Mital Mistry

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

There have been a few changes in the directory structure (in 10.3) and as noted in the release notes you will need to use an update of 10.2 if you use Itanium.

So the answer depends on your architecture and which compiler you are using. In any case you will need to relink your application.

Regards,

Todd

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Our architecture is IA 64 and we have both intel and GNU compilers.

I was asked to update to MKL 10.3 by an intel person on other fourm because of some issues I am having with OOC.

Our system is SGI 8 processor with 32 GB of memory, and I guess we have Itanium.

What do you mean by update 10.2? is that for mkl or compiler?

Regards,

Mital

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Intel Compilers (or now, Intel Composer XE 2011) include Intel MKL, but the version numbers don't correspond (take a look at this table). I was referring above to Intel MKL version numbers. Intel MKL 10.2 update 7 is the very latest version of Intel MKL that supports Itanium processorsand it can be found in Intel Compiler 11.1 update 8.

For the issue on OOC that you refer to, you will either need a workaround or a fixed version of 10.2.

Todd

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Mital

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Here is the output of "uname -a" command;

"Linux keisgi 2.6.16.54-0.2.12-default #1 SMP Fri Oct 24 02:16:38 UTC 2008 ia64 ia64 ia64 GNU/Linux"

Also, with our current version of MKL (10.0.1.014), my code is running since last 10 hrs but it is at the following stage:

**************************************************************************************

ooc_max_core_size got by Env = 16000

The file ./pardiso_ooc.cfg was not opened

**************************************************************************************

The matrix is 150,000,000 x 150,000,000 with 60,000,000 non zeros. If Run it in-core, it says that LU has approximately 950,000,000 non-zeros and exit with -2 error. If I don't provide "MKL_PARDISO_OOC_MAX_CORE_SIZE=16000" and run it with out-core then it exit with -9 error.

Regards,

Mital

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

I installed 10.2.7.041 and it is working perfectly with OOC. I am putting my case output so it might be helpful to other users. Thanks for your help.

******************************************************************************************

ooc_max_core_size got by Env = 30000

The file ./pardiso_ooc.cfg was not opened

=== PARDISO is running in Out-Of-Core mode, because iparam(60)=2 ===

================ PARDISO: solving a symmetric indef. system ==========

The local (internal) PARDISO version is : 1020001

PARDISO double precision computation is turned ON

METIS algorithm at reorder step is turned ON

Task fits RAM, OOC NLL factorization algorithm is turned ON

Scaling is turned ON

Summary PARDISO: ( reorder to reorder )

================

Times:

======

Time spent in calculations of symmetric matrix portrait(fulladj): 1.21927

Time spent in reordering of the initial matrix(reorder) : 24.2554

Time spent in symbolic factorization(symbfct) : 22.7608

Time spent in data preparations for factorization(parlist) : 3.19630

Time spent in in allocation of internal data structures(malloc) : 1.17290

Time spent in additional calculations : 17.7799

Total time spent : 70.3847

Statistics:

===========

< Parallel Direct Factorization with #processors: > 8

< Numerical Factorization with BLAS3 and O(n) synchronization >

< Linear system Ax = b>

#equations: 1460346

#non-zeros in A: 58022798

non-zeros in A (%): 0.002721

#right-hand sides: 101

< Factors L and U >

#columns for each panel: 192

#independent subgraphs: 0

< Preprocessing with state of the art partitioning metis>

#supernodes: 135919

size of largest supernode: 20746

number of nonzeros in L 3449087253

number of nonzeros in U 1

number of nonzeros in L+U 3449087254

Reordering completed ...

Number of nonzeros in factors = -845880042

Number of factorization MFLOPS = 32965576

Percentage of computed non-zeros for LL^T factorization

0 % 1 % 2 % 3 % 4 % 5 % 6 % 7 % 8 % 9 % 10 % 11 % 12 % 13 % 20 % 21 % 22 % 23 % 24 % 25 % 26 % 27 % 28 % 29 % 30 % 31 % 38 % 39 % 40 % 41 % 42 % 43 % 44 % 45 % 46 % 47 % 48 % 49 % 56 % 57 % 58 % 59 % 60 % 61 % 62 % 63 % 64 % 65 % 66 % 67 % 64 % 75 % 76 % 77 % 78 % 79 % 80 % 81 % 82 % 83 % 84 % 85 % 86 % 93 % 94 % 95 % 96 % 97 % 98 % 99 % 100 %

================ PARDISO: solving a symmetric indef. system ================

Summary PARDISO: ( factorize to solve )

================

Times:

======

Time spent in copying matrix to internal data structure(A to LU): 0.000001 s

Factorization: Time for writing to files : 0.000000

Factorization: Time for reading from files : 0.000000

Time spent in factorization step(numfct) : -1980.722245 s

Solution: Time for reading from files : 0.000000

Time spent in direct solver at solve step (solve) : 766.926043 s

Time spent in in allocation of internal data structures(malloc) : 17663.688013 s

Time spent in additional calculations : 4016.110042 s

Total time spent : 20466.001854 s

==============================================================

----------- Out of core time (in percent (%)) --------------

Factorization step (100 (%)):

write to files : 0

read from files: 0

factorization - write&read : 100

Solution step (100 (%)):

read from files: 0

solve - write&read: 100

Total time (100 (%)):

read from files: 0

total - write&read: 100

----------- Out of core Mb --------------

Factorization step:

write to files : 0.000 Mb

read from files: 0.000 Mb

Solution step:

read from files: 0.000 Mb

Total size of data transferred :

write&read : 0.000 Mb

==============================================================

Statistics:

===========

< Parallel Direct Factorization with #processors: > 8

< Numerical Factorization with BLAS3 and O(n) synchronization >

< Linear system Ax = b>

#equations: 1460346

#non-zeros in A: 58022798

non-zeros in A (%): 0.002721

#right-hand sides: 101

< Factors L and U >

#columns for each panel: 192

#independent subgraphs: 0

< Preprocessing with state of the art partitioning metis>

#supernodes: 135919

size of largest supernode: 20746

number of nonzeros in L 3449087253

number of nonzeros in U 1

number of nonzeros in L+U 3449087254

gflop for the numerical factorization: 32965.576421

Solve completed ...

**********************************************************************************************

Thank you very much.

Regards,

Mital

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page