This is a conceptual question:
Expression templates are a popular technique in C++ in order to implement Matrix and Array operations by avoiding unnecessary temporaries and loop unrolling. In other words using expression templates, an expression such as D = A+B+C, where D, A, B & C are matrices will not incur the temporaries usually resulting in a naive C++ implementation. How does this compare in performance terms by using C++ wrappers around the MKL BLAS routines. In other words will a naive implementation of a Matrix/Array class wrapping the optimized BLAS routines perform at least as well as an implementation using expression templates?
I realise this question is quite general in essence, but would be quite grateful if someone could provide me some hints on this.
Yes I am aware of Armadillo, Blitz, Eigen and recently Blaze, which all use expression templates in one form or the other to do loop unrolling and avoid temporaries. Eigen has a very spurious benchmark (in my opinion) where it asserts that it has performance similar to Intel MKL using expression templates (http://eigen.tuxfamily.org/index.php?title=Benchmark). After some digging I realized that their benchmark was using a single thread only.
My original question is: If I write naive OOP wrappers around the MKL routines, will I get performance in par with libraries using expression templates but *not* the MKL routines? Most of the afore-mentioned libraries are very opaque in nature with very little documentation about their internals, which kinda makes it difficult to extend them.