- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

The "12.0 update 4" compilers implemented an arch-consistency option for the svml, which gives up some performance in return for running the same code across a variety of platforms, yielding more consistently accurate results. There's still a significant gain from the vectorization.

For real to real power, as long as you take care to avoid gratuitous data type mixtures, the Intel compilers still auto-vectorize. e.g. use powf(float, float) or pow(double, double) in order to get svml vectorization.

For small vectors, you are better off letting the compiler auto-vectorize, rather than calling an MKL function. If small vector means size < 50, it will be difficult to find any choice which performs consistently well.

Don't switch from Fortran to C or C++ and expect performance of vector operations unless you are willing to decorate your code with appropriate *restrict definitions and even check correctness of results with #pragma simd reduction() and the like. The MKL functions might be more attractive to someone who is saddled with rules against vectorizable source syntax (e.g. using Microsoft compilers).

MKL level 3 BLAS, e.g. matrix multiplication, might be expected to be well optimized for any case where the smallest dimension is 16 or more (recognizing that only larger problems will benefit from threading at MKL level).

I agree, if you are using MKL matrix multiply in a block solver with such small blocks, you will need to exploit parallelism at a higher level.

Link Copied

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

The threading libraries of TBB and MKL (OpenMP) aren't compatible. Under the current setup, under TBB you would be limited to the mkl sequential library, with threading under the control of TBB.

The "work-stealing" capabilities of TBB, Cilk+, ArBB threading libraries (allowing multiple applications to coexist effectively) may come at a price. With OpenMP, for full performance, you designate a group of cores dedicated to your task and the OpenMP library expects to hog those cores.

By following the ancient adage "a Fortran programmer can write Fortran in any language," it is usually possible to achieve similar performance with Intel C++ and Fortran. Both Intel compilers share the compiler auto-vectorization, taking advantage of a single "short vector math library" for math functions.

Many OOP idioms don't leave room for "Fortran in any language" style, and so you may have better performance with the recently added OOP capabilities of Fortran than C++.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

*dgetrf and dgetrs from BLAS95 (MKL library) are called million times during a simulation with matrices ranging 10-30 elements*

There is something inconsistent in that statement.

First of all, the factorization and decomposition routines are from Lapack, not BLAS. However, if you are calling MKL, which contains BLAS, Lapack and other routines, the distinction gets blurred.

Secondly, if you were calling through the Fortran 95 interfaces, you would have been calling the generic names GETRF and GETRS. In that case, the overhead of allocating, populating and deallocating work arrays millions of times could and should be avoided, by putting in the programming effort to call the Fortran-77 routines.

This is something that you can check by examining the MKL calls in your source code.

The usual alerts apply concerning factorizing a matrix only once if solving a sequence of problems with the same matrix but different right hand sides.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

The "12.0 update 4" compilers implemented an arch-consistency option for the svml, which gives up some performance in return for running the same code across a variety of platforms, yielding more consistently accurate results. There's still a significant gain from the vectorization.

For real to real power, as long as you take care to avoid gratuitous data type mixtures, the Intel compilers still auto-vectorize. e.g. use powf(float, float) or pow(double, double) in order to get svml vectorization.

For small vectors, you are better off letting the compiler auto-vectorize, rather than calling an MKL function. If small vector means size < 50, it will be difficult to find any choice which performs consistently well.

Don't switch from Fortran to C or C++ and expect performance of vector operations unless you are willing to decorate your code with appropriate *restrict definitions and even check correctness of results with #pragma simd reduction() and the like. The MKL functions might be more attractive to someone who is saddled with rules against vectorizable source syntax (e.g. using Microsoft compilers).

MKL level 3 BLAS, e.g. matrix multiplication, might be expected to be well optimized for any case where the smallest dimension is 16 or more (recognizing that only larger problems will benefit from threading at MKL level).

I agree, if you are using MKL matrix multiply in a block solver with such small blocks, you will need to exploit parallelism at a higher level.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

*>When you say call with fortran-77 routines what do you mean?*

See the MKL documetntation.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

I read this post by chance. I suppose that, at the present time, the C++ library is ready but, I have to say that I faced the exact same problem. I had a code for Multibody Dynamics in Fortran 77, not general at all and suddenly I needed to convert it into a general code. I moved to Fortran 90 but as the functionality grew, more and more complexity was needed.

Nowadays, I have a FORTRAN 2008 library using Fortran object oriented features (inheritance, procedure and data polymorphism, information hidind, etc) and I don't find any good reason for rewritting my code in C++ (I rather would do the opposite). Moreover, for some features like 3D graphics rendering or communication with devices I use interoperability with C to call from my C++ main program to the fortran library and to call from my fortran library or program to C or C++ functions.

Of course it is always a matter of preferences, but I programmed in C++ during years, so I don't have any prejudice to any of these two languages.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page