Intel® C++ Compiler
Community support and assistance for creating C++ code that runs on platforms based on Intel® processors.
7944 Discussions

Parallel matrix libraries supported by ICC

dehvidc1
Beginner
436 Views

I'm working with a genomics institute to examine the benefits of using the Intel compiler and toolset with their production code. One aspect of their production systems that has come under review is the use of Blitz. The Institute started using Blitz 10 years ago and have used it extensively since.

From the Blitz WWW site, "Blitz++ is a C++ class library for scientific computing which provides performance on par with Fortran 77/90. It uses template techniques to achieve high performance. The current versions provide dense arrays and vectors, random number generators, and small vectors and matrices".

The Institute is keen on a number of Blitz features including very readable syntax for code using matrices and simple to use array resizing.


A number of issues have come up during the work I've done over the last few months.

a/ Blitz requires pthreads to build thread safe. This raises problems on Windows as Windows doesn't have native POSIX thread support.

b/ On their WWW site The Blitz team suggest looking at other libraries for threaded applications.

c/ The Institute is currently using gcc 4.1. They would like to examine the use of OMP in their code and so are looking to build their code under gcc 4.4.4. Blitz is causing build problems.

d/ Some substantial performance improvements were achieved by replacing Blitz arrays with pointer arithmetic based implementations under ICC and MKL calls.

e/ Blitz development appears to have largely stopped around 5 or so years ago.


The Institute would like to look at other potential solutions.


A partial requirement list would be:

1/ High-performance thread capable matrix library.

2/ Simple to use matrix operationsyntax. For example, adding two arrays by = + .

3/ Active development community or vendor support.

4/ Supports ICC and GCC.

5/ Can be built and executed on Linux and Windows.


I think from a performance perspectiveICC with IPP, MKL and TBB would be satisfactory. But as well as performance the Institute isvery keen on the simple to use matrix operationsyntax aspect.

If anyone has any suggestions on possible alternatives that would be greatly appreciated

Regards

David

0 Kudos
8 Replies
Om_S_Intel
Employee
436 Views

Intel software team is striving to provide ease of use, performance and support on tools and libraries. These works on linux and windows. We would be happy answer specific questions.
0 Kudos
jimdempseyatthecove
Honored Contributor III
436 Views
David,

If you are looking for a portable (Windows/Linux)thread safe and high performance library that has all the features of Blitz then why not hire a capable programmer to adapt the Blitz library to your purposes? There are several capable programmers on this forum, including myself, who would be willing to take on this project.

Fixing up Blitz after 5 years of neglect is only one aspect of the work you should be seeking. I would have to make an educated guess that your institute has some applications that are workhorse applications. These applications should be examined for parallelization opportunities as well. As a rule of thumb, the further out you begin parallization, the better the performance gained.

Jim Dempsey
0 Kudos
dehvidc1
Beginner
436 Views
Thanks for the offer, Jim. I guessone issue would be that the Institute would then be tied to a version out of the Blitz library development mainstream (such as it is :). Another issue is that if there's something already out there offering this feature mix the Institute might be better off taking the hit to move to a new library rather than going it alone with updating a superseded solution.

I think there's also a wider issue in that from what the the Blitz WWW site says Blitz stemmed from the thoughts behind a number of papers in the mid-90's around the issue of how to get Fortran array like performance from C++ arrays. Compile time generation of code from templates was seen as one ofthe answers. Given the performance benefits I obtained by replacing Blitzarrays with MKL calls and traditional array loops I'm not sure whetherthe original aim was achieved.

Compiler technology has moved forwards a lot since the mid-90's. The best approach now will be different to 15 years ago.

And finally with respect to performance there's a dynamic tension between using templates, overloaded operators, clever iterators etc and simple, bog-standard arrays. Most of my performance work has been done with C, assembler and Ada. The simpler the arrays in Ada the easier it was for the compiler to optimise. (I spent a lot of time working with a prominent compiler vendor to improve their code generation from Ada so this is an area in which I have some insight.) C arrays are very simple for an optimising compilerto work with. But you give up simple to use matrix operation syntax and other nice features. Much, probably most, of the C++ code base isn't all that concerned with performance and is more focused on rapid development using useful language features and well built libraries.

The WWW seems litteredwith partlyimplemented attempts to resolve this dichotomy. I wonder if Boost - which appears to be undergoing very active development -have performance as one of their primary goals?

Regards

David
0 Kudos
JenniferJ
Moderator
436 Views
Hi David,
Let me introduce a new feature to you. It might be something you're looking for.

In the next release of our compiler, a new language extension feature "array notation" is added.
The C/C++ extensions for array notations feature provides data parallel array notations with the following major benefits:
Allows you to use array notation to program parallel operations in a familiar language
Achieves predictable performance based on mapping parallel constructs to the underlying multi-threaded and SIMD hardware
Enables compiler parallelization and vectorization with less reliance on alias and dependence analysis

The product is not out yet. But a similar version is contained inthe Intel Parallel Composer 2011 that is available for eval at the evaluation center. Refer to the documentation for detail syntax etc.

Thanks,
Jennifer
0 Kudos
jimdempseyatthecove
Honored Contributor III
436 Views
Jennifer,

The CEAN features simplify use of SSE (eliminates most of the usage of the SSE intrinsic functions) but is not a full fledged substitution for array notation. In C++ I bleieve templates are the way to go.

Good (efficient)template writing is not easy. Inefficient template writing is not too hard. The question becomes, how much effort do you place into your templates.

A good matrix template library would (conditionally) make use of the most effective techniques including but not limited to:

SSE / AVX intrinsic functions
IPL and / or MKL and / or BLAS, ...
hand tuned C/C++ code
additional templates

Note, the templates become a meta language. You can alter (improve) the internal workings without affecting the "language". Meaning any future changes apply to the template library and not to the source files that use them.

Jim Dempsey
0 Kudos
JenniferJ
Moderator
436 Views
While talking about template, it reminded me the Intel valarray.

I wouldn't write one myself, it'sa lotof work and may not be as good at the end.

[cpp]#include 
void test( )
{  
    std::valarray vi(N), va(N);
    vi = vi + va;  //array addition  
}
[/cpp]

icl /Quse-intel-optimized-headers /c t.cpp


Jennifer
0 Kudos
Dale_S_Intel
Employee
436 Views
Might I suggest something new from Intel, Array Building Blocks? It sounds like it might be just the sort of thing you're looking for. You can read more about and sign up for the beta here:
It's a template based array library with a sophisticated run-time, somewhat similar to tbb but with some other nice features. It should work on Windows and Linux. I don't know if it works with gcc, but I don't think there's anything fundamentally preventing that.
There's lots of documentation at the link above, butthere's also a forum here:
where you can get answers to any questions you have about it.
If it works out for you, I'd be interested in seeing what you find.
Thanks!
Dale
0 Kudos
dehvidc1
Beginner
436 Views
Thanks, Jennifer. I had noticed that Intel said they had an efficient valarray implementation. But using valarraysto implement2D and3D arrays takes some backsprings and handstands.
0 Kudos
Reply