Given that both functions have a very similar interface and produce the same result I was hoping to see some performance gain but instead I got the error "LAPACKE_dormqr failed with info = -1" trying to compute the Q^T*c with dormqr using the output generated by dgeqp3. Would this be a known bug of the MKL version composerxe-2011.4.184 I have? Or is there a slightly different workflow for this other QR dgeqp3 variant?
I know but if you read the documentation of dgeqp3 specifically the "Application Notes" below of dgeqp3 you will find:
Application Notes To solve a set of least squares problems minimizing ||A*x - b||2 for all columns b of a given matrix B, you can call the following: - ?geqp3 (this routine) to factorize A*P = Q*R; - ormqr to compute C = QT*B (for real matrices); - unmqr to compute C = QH*B (for complex matrices); - trsm (a BLAS routine) to solve R*X = C.
So basically should be the same workflow as dgeqrf. However a friend of mine benchmarked dgeqrf vs dgeqp3 in front of me now using python and we can see dgeqp3 performing orders of magnitude slower than dgeqrf so never mind.