topic C++ MKL BLAS wrappers vs expression templates in Intel® oneAPI Math Kernel Library

C++ MKL BLAS wrappers vs expression templates

Anwar_Ludin — Tue, 23 Dec 2014 15:41:13 GMT

This is a conceptual question:

Expression templates are a popular technique in C++ in order to implement Matrix and Array operations by avoiding unnecessary temporaries and loop unrolling. In other words using expression templates, an expression such as D = A+B+C, where D, A, B & C are matrices will not incur the temporaries usually resulting in a naive C++ implementation. How does this compare in performance terms by using C++ wrappers around the MKL BLAS routines. In other words will a naive implementation of a Matrix/Array class wrapping the optimized BLAS routines perform at least as well as an implementation using expression templates?

I realise this question is quite general in essence, but would be quite grateful if someone could provide me some hints on this.

Thanks!

Have you looked at Eigen?

Zhang_Z_Intel — Wed, 24 Dec 2014 23:06:55 GMT

Have you looked at Eigen? http://eigen.tuxfamily.org/index.php?title=Main_Page

Yes I am aware of Armadillo,

Anwar_Ludin — Thu, 25 Dec 2014 00:07:41 GMT

Yes I am aware of Armadillo, Blitz, Eigen and recently Blaze, which all use expression templates in one form or the other to do loop unrolling and avoid temporaries. Eigen has a very spurious benchmark (in my opinion) where it asserts that it has performance similar to Intel MKL using expression templates (http://eigen.tuxfamily.org/index.php?title=Benchmark). After some digging I realized that their benchmark was using a single thread only.

My original question is: If I write naive OOP wrappers around the MKL routines, will I get performance in par with libraries using expression templates but *not* the MKL routines? Most of the afore-mentioned libraries are very opaque in nature with very little documentation about their internals, which kinda makes it difficult to extend them.