Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Intel Community
- Software
- Software Development SDKs and Libraries
- Intel® oneAPI Math Kernel Library
- C++ and Intel MKL for Scientific programming

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

Petros

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

06-23-2011
09:47 AM

143 Views

C++ and Intel MKL for Scientific programming

I need your expertise...

I will start writing a Scientific program (a numerical simulator for Differential Algebraic Equation system). The default language most people in Power Engineering use is Fortran. We currently have a very fast software developed in fortran.

The software, though, is getting so big with so many features that is very hard to maintain (a lot of code is duplicated with minor changes all over the program) in its current form. Also, a lot of the features that will be implemented in the future involve threading, running same algorithms on different data sets concurenlty, running on cluster etc.

I want to rewrite the software in an object oriented form in order to make it more maintainable, expandable and reusable. Also, I'm interested in trying Intel TBB for the threading needs and a GUI library for setting up an basic interface for the program.

My concern is: What about the speed of the program? Is C++ able to compare with Fortran? What Mathematical libraries should I use?

The current fortran implementation uses:

-elemental mathematical expressions (pow, exp, mult etc)

-dgetrf and dgetrs from BLAS95 (MKL library) are called million times during a simulation with matrices ranging 10-30 elements

-a sparse solver (ma37 - HSL mathematical software library) is called several times on a big (30000+ element matrice)

I currently use the newest version of Intel Parallel Studio 2011 with MKL. Also, my university owns the Intel cluster toolkit.

Many people advocate against the performance of C++ for scientific reasons (use of temporary copies etc). I also found several libraries (blitz++, goto, a++, atlas, eigen ...) promising near-fortran peformance.

Does anyone have experience with this kind of numerically intensive implementations using C++? If I use the tools (pow, exp, matrice mult, blas functions, sparse solvers etc) offered by MKL will I get good performance?

I am not looking for a definitive answer, since I know it's a difficult question. Any kind of help or insight is welcome.

Petros

1 Solution

TimP

Black Belt

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

06-23-2011
01:47 PM

143 Views

The "12.0 update 4" compilers implemented an arch-consistency option for the svml, which gives up some performance in return for running the same code across a variety of platforms, yielding more consistently accurate results. There's still a significant gain from the vectorization.

For real to real power, as long as you take care to avoid gratuitous data type mixtures, the Intel compilers still auto-vectorize. e.g. use powf(float, float) or pow(double, double) in order to get svml vectorization.

For small vectors, you are better off letting the compiler auto-vectorize, rather than calling an MKL function. If small vector means size < 50, it will be difficult to find any choice which performs consistently well.

Don't switch from Fortran to C or C++ and expect performance of vector operations unless you are willing to decorate your code with appropriate *restrict definitions and even check correctness of results with #pragma simd reduction() and the like. The MKL functions might be more attractive to someone who is saddled with rules against vectorizable source syntax (e.g. using Microsoft compilers).

MKL level 3 BLAS, e.g. matrix multiplication, might be expected to be well optimized for any case where the smallest dimension is 16 or more (recognizing that only larger problems will benefit from threading at MKL level).

I agree, if you are using MKL matrix multiply in a block solver with such small blocks, you will need to exploit parallelism at a higher level.

Link Copied

7 Replies

TimP

Black Belt

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

06-23-2011
10:39 AM

143 Views

The threading libraries of TBB and MKL (OpenMP) aren't compatible. Under the current setup, under TBB you would be limited to the mkl sequential library, with threading under the control of TBB.

The "work-stealing" capabilities of TBB, Cilk+, ArBB threading libraries (allowing multiple applications to coexist effectively) may come at a price. With OpenMP, for full performance, you designate a group of cores dedicated to your task and the OpenMP library expects to hog those cores.

By following the ancient adage "a Fortran programmer can write Fortran in any language," it is usually possible to achieve similar performance with Intel C++ and Fortran. Both Intel compilers share the compiler auto-vectorization, taking advantage of a single "short vector math library" for math functions.

Many OOP idioms don't leave room for "Fortran in any language" style, and so you may have better performance with the recently added OOP capabilities of Fortran than C++.

Petros

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

06-23-2011
10:56 AM

143 Views

I'll try to answer a few remarks:

-I already use the sequential mkl library under fortran since the matrices that I solve are 10 to 30 elements, so, the threading library doesn't make me any profit. The sparse solver I use is so fast that I never thought of using a threaded one.

-The parallelization I need is not at numerical methods' level but at higher level. For example not solve a single (30x30) matrice using threaded MKL library but solve 6000 (30x30) blocks in parallel (each one affecting different data set) using sequential library for each.

-I have been programming in C for 4 years and Java for 2 years before passing to fortran 95 in the last 6 months (the program was pre-existing my arrival at this job).

I can understand about the performance of MKL libraries but what about the performance of elementary mathematical functions? For example raising to real powers, multiplying small/big vectors etc?

Again thanks for the help.

mecej4

Black Belt

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

06-23-2011
11:01 AM

143 Views

First of all, the factorization and decomposition routines are from Lapack, not BLAS. However, if you are calling MKL, which contains BLAS, Lapack and other routines, the distinction gets blurred.

Secondly, if you were calling through the Fortran 95 interfaces, you would have been calling the generic names GETRF and GETRS. In that case, the overhead of allocating, populating and deallocating work arrays millions of times could and should be avoided, by putting in the programming effort to call the Fortran-77 routines.

This is something that you can check by examining the MKL calls in your source code.

The usual alerts apply concerning factorizing a matrix only once if solving a sequence of problems with the same matrix but different right hand sides.

Petros

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

06-23-2011
01:26 PM

143 Views

Hi, sorry for that, my mistake. It's the lapack95 library. I use DGETRF with the callCALL DGETRF (M, N, A, LDA, IPIV, info) to factorize once and then DGETRS with the call CALL DGETRS(TRANS, N, NRHS, A, LDA, IPIV, B, LDB, INFO) for solving several times with different RHS.

I link with-lmkl_intel_lp64 -lmkl_lapack95_lp64 -lmkl_sequential -lmkl_core -lpthread -lma37 -liomp5.

When you say call with fortran-77 routines what do you mean?

Thanks

TimP

Black Belt

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

06-23-2011
01:47 PM

144 Views

The "12.0 update 4" compilers implemented an arch-consistency option for the svml, which gives up some performance in return for running the same code across a variety of platforms, yielding more consistently accurate results. There's still a significant gain from the vectorization.

For real to real power, as long as you take care to avoid gratuitous data type mixtures, the Intel compilers still auto-vectorize. e.g. use powf(float, float) or pow(double, double) in order to get svml vectorization.

For small vectors, you are better off letting the compiler auto-vectorize, rather than calling an MKL function. If small vector means size < 50, it will be difficult to find any choice which performs consistently well.

Don't switch from Fortran to C or C++ and expect performance of vector operations unless you are willing to decorate your code with appropriate *restrict definitions and even check correctness of results with #pragma simd reduction() and the like. The MKL functions might be more attractive to someone who is saddled with rules against vectorizable source syntax (e.g. using Microsoft compilers).

MKL level 3 BLAS, e.g. matrix multiplication, might be expected to be well optimized for any case where the smallest dimension is 16 or more (recognizing that only larger problems will benefit from threading at MKL level).

I agree, if you are using MKL matrix multiply in a block solver with such small blocks, you will need to exploit parallelism at a higher level.

mecej4

Black Belt

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

06-23-2011
02:40 PM

143 Views

Daniel_Dopico

Novice

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

03-31-2019
04:01 AM

143 Views

I read this post by chance. I suppose that, at the present time, the C++ library is ready but, I have to say that I faced the exact same problem. I had a code for Multibody Dynamics in Fortran 77, not general at all and suddenly I needed to convert it into a general code. I moved to Fortran 90 but as the functionality grew, more and more complexity was needed.

Nowadays, I have a FORTRAN 2008 library using Fortran object oriented features (inheritance, procedure and data polymorphism, information hidind, etc) and I don't find any good reason for rewritting my code in C++ (I rather would do the opposite). Moreover, for some features like 3D graphics rendering or communication with devices I use interoperability with C to call from my C++ main program to the fortran library and to call from my fortran library or program to C or C++ functions.

Of course it is always a matter of preferences, but I programmed in C++ during years, so I don't have any prejudice to any of these two languages.

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

For more complete information about compiler optimizations, see our Optimization Notice.