And what we have now? With regard to the orthogonality of vectors, then the implementation in recent releases Intel encouraging. A parallelization in dstegr Intel MKL is not implemented, and speed problems.
For the tridiagonal matrix from paragraph 2.2 of the size n = 30001 is my result - with 56.6 sec (hardware configuration: i7 860 processor (Speed: 2.80 GHz), Motherboard DP55KG,DDR31333 MHz (8 GB), OS Windows XP Professional x64 Edition SP2,Intel MKL 10.2 Update 4, EM64T,HT off). And dstegr Intel MKL provides 19 min. 37 sec. (result is given to the frequency of 2.80 GHz to compensate for the turbo boost, because parallelization in dstegr Intel MKL is not implemented). The difference in more than 20 times!
1) Yes, you are right. This function is not threadedat all. Our implementation of this algorithm is the same as netlib has.
2)how can we verify this? Can you give us the binaries to check the problem on our side?
Yes. But the idea of my algorithms have not changed. And Intel in that time has moved only in addressing the matrix multiplication for IA32 by prof. Granovsky (for 65 nm. processors). And Intel has also significantly improved the RRR algorithm. Regarding the algorithm, which I counterpoise RRR, he is much improved.
can use the source code: this is my gift. Id like to notice that dlarfb has a lot of applications so the importance of changes submitted by me is quite high.