- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
New in Intel MKL 10.2 Update 6:
New Features
o Integrated Netlib LAPACK 3.2.2 including one new computational routine (?GEQRFP) and two new auxiliary routines (?GEQR2P and ?LARFGP)
Performance improvements
o Improved DZGEMM performance on Intel Xeon processors series 5300 and 5400 with 64-bit operating systems
o Improved DSYRK performance on Intel Xeon processors series 5300 with 32-bit operating systems with the most significant improvements for small oblong matrices on 8 and more threads
o Improved the scalability of (C/Z)GGEV by parallelizing the reduction to generalized Hessenberg form ((C/Z)GGHRD)
o Improved performance for ?(SY/HE)EV and ?(SP/HP)TRS on very small matrices (< 20)
o Improved performance of FFTW2 wrappers for those cases where the descriptor remains constant from call to call
o Improved Scalability of threaded applications that use non-threaded FFTs on multi-socket systems
o Significantly improved performance of cluster FFTs through better load balancing when the input data cannot be evenly distributed between MPI processes
o Improved scalability of cluster FFTs on systems with a non-power-of-2 number of cores/processors
o Improved performance of factorization step in PARDISO out-of-core for huge matrices through reduction in the number of disk IO operations
o Parallelized solve step in PARDISO
Usability/Interface improvements
o Improved support for F77 in FFTW2 and MPI FFTW2 interfaces
o Implemented rfftwnd_create_plan_specific and its 2d and 3d variants
o Added 2D Convolution/Correlation examples
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
as far as I know the feature 'Parallelized solve step in PARDISO' is a new feature in PARDISO4 from the University of Basel. I'm very interested in this but also in another important feature of that version : Reproducibility of exact numerical results on multi-core architectures. Is this also included in MKL10.2 Update6? If not, are there any plans to integrate this feature?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
do you have any news regarding the "reproducibility-feature"? I have seen in other threads, that some MKL users also report on a non-deterministic behaviour of PARDISO when used in parallel. So I think this feature would be very appreciated.
Kind regards,
Rene
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
thank you for the information. I will try it, although I'm not very optimistic that it helps in my case.
I want to point out again that I'm very interested in a feature that deletes non-determinism (in a certain range) in PARDISO. Regarding your concerns, I don't believe that the performance will suffer too much. I got this comparison from the PARDISO-website:
The solver is now able to compute the exact bit identical solutionKind regards,
independent on the number of cores without effecting the scalability.
Here are some results for a nonlinear FE model with 500'000 elements.
Intel MKL PARDISO 10.2
1 core - factor: 17.980 sec., solve: 1.13 sec.
2 cores - factor: 9.790 sec., solve: 1.13 sec.
4 cores - factor: 6.120 sec., solve: 1.05 sec.
8 cores - factor: 3.830 sec., solve: 1.05 sec.
U Basel PARDISO 4.0.0:
1 core - factor: 16.820 sec., solve: 1.09 sec.
2 cores - factor: 9.021 sec., solve: 0.67 sec.
4 cores - factor: 5.186 sec., solve: 0.53 sec.
8 cores - factor: 3.170 sec., solve: 0.43 sec.
Rene
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Rene,
PARDISO 4.0 from the PARDISO-website supports a bit-to-bit correspondence onlyfor symmetric indefinite matrices. Migration to a machine with another instruction set breaks this bit-to-bit compatibility. Sothe comaptibilitycan be observed for a prescribed set of machines with identical instruction set and the same number of cpus.
Moreover sparse direct solvers are quite sensitive to a matrix structure. So the performance should suffer in cases when the usage of dynamic parallelization gives essential advanatage over static parallelization with prescribed list of jobs for each thread. In most cases, theoretically, the performance has to suffer.
We have been unable to verify the performance information you quote due to a lack of information on how to reproduce them.
All the best
Sergey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I can imagine that the performance potentially suffers in cases when a dynamic parallelization has advantages over a static one. However, I thought that my request would be an option which you would provide to the user. So if the user wants to get out the last drop of performance he/she has the freedom not to use it.
I provided some data to a colleague of you (Sergey Gololobov). This data is of course different from the one in my quotation, but it also results from a nonlinear FE model. If you like, you can use this instead.
Kind regards,
Rene
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page