Solved: Least squares - estimation of fit error

ZlamalJakub · ‎12-21-2016

I am using routines DGEQPF, DORMQR and DTRSV (according to example in dgeqpfx.f) and it works well.

I also need to calculate estimation of errors of fitted parameters but I did not find any example.

Can someone give me advice how to calculate it?

mecej4 · ‎12-22-2016

The topic you ask about involves a combination of linear algebra, linear least squares and statistics, but it is not difficult to put the pieces together.

Let c₀ be the "regression coefficients" or "fitted parameters" that you have found, and let R be the upper triangular matrix obtained from the Q-R factorization of the regressor matrix. The "confidence interval" for c, the true but unknown value of the regression coefficients, is given by c₀ - δ_c < c < c₀ + δ_c, where

δ_c = t_ν,α_/2 [diag (R^TR)^-1Σ (y - y_f)²/ν]^1/2,

t is the Student-t variable, with tail area α/2 and degrees-of-freedom ν = n - m of the Student-t PDF,

n = number of observations,

m = number of regression coefficients,

y, y_f are the observed and fitted values of the dependent variable.

View solution in original post

mecej4 · ‎12-22-2016

The topic you ask about involves a combination of linear algebra, linear least squares and statistics, but it is not difficult to put the pieces together.

Let c₀ be the "regression coefficients" or "fitted parameters" that you have found, and let R be the upper triangular matrix obtained from the Q-R factorization of the regressor matrix. The "confidence interval" for c, the true but unknown value of the regression coefficients, is given by c₀ - δ_c < c < c₀ + δ_c, where

δ_c = t_ν,α_/2 [diag (R^TR)^-1Σ (y - y_f)²/ν]^1/2,

t is the Student-t variable, with tail area α/2 and degrees-of-freedom ν = n - m of the Student-t PDF,

n = number of observations,

m = number of regression coefficients,

y, y_f are the observed and fitted values of the dependent variable.

ZlamalJakub · ‎12-22-2016

Thank You very much. I did not found this equation for QR factorisation.

I have one another question. Routine DGEQPF permutes columns of orignial matrix. Then I assume that the values \delta_c will be permuted so. Is it true?

mecej4 · ‎12-22-2016

Whether pivoting is used or not is a detail at the programming level, and depends on which Lapack routine is used for performing the QR decomposition.

If you use DGEQRF, pivoting is not used and the question goes away.

Please note that DGEQPF is now deprecated; use DGEQP3 instead. If pivoting is used, the permutation vector does need to be used in reporting δ_c. However, only the diagonal of R^TR needs to be permuted, so doing so is not difficult, although it can appear tricky to do so at first look.

ZlamalJakub · ‎12-23-2016

Thanks for advice, I will try.

ZlamalJakub · ‎01-01-2017

I have finished my routine calculating LSQ fit and also standard deviations of fitted parameters. It is not well memory optimised, but seems to work.

	subroutine DSOLVEQRErrorEstim_MKL(A,B,X,Xerr,m,n,tol,info)
		implicit none
		integer*4,intent(in) :: m,n		! m - number of equations, n - number of unknowns
		real*8,intent(in) :: A(m,n)		! matrix A  (A*X=B)
		real*8,intent(inout) :: B(m,1)	! righthand side
		real*8,intent(out) :: X(n,1)		! fitted parameters
		real*8, intent(out) :: Xerr(n,1)	! standard deviations of fitted parameters
		real*8, intent(in) :: tol			!	Choose TOL to reflect the relative accuracy of the input data
		integer*4, intent(inout) :: info  ! result of MKL routines 0 if all is ok
	
      INTEGER          MMAX, NMAX, LDA, LDB, LDX, NRHMAX, LWORK
      real*8,parameter ::  ZERO=0.0D0
      INTEGER          I, IFAIL, J, K, NRHS
      real*8	       TAU(N),WORK(64*n)
      DOUBLE PRECISION RWORK(2*N)
      INTEGER          JPVT(N),ipiv(n)
      CHARACTER        CLABS(1), RLABS(1)
		real*8			 R(n,n)  ! factorised matrix R
	 	real*8			Acopy(m,n),Bcopy(m)  ! copy of input matrices
		real*8		Bout(m)
		real*8		err,sigma
		lda=m
		ldb=m
		nrhs=1
		lwork=64*n

		! get a copy of input matrices to evaluate deviations
		Acopy=A
		Bcopy(1:m)=B(1:m,1)


!        Initialize JPVT to be zero so that all columns are free
         JPVT=0

!        Compute the QR factorization of A
         CALL DGEQP3(M,N,A,LDA,JPVT,TAU,WORK,lwork,INFO)
			if (info/=0) then
				return
			endif
			
			! copy R for evaluation of deviations
			R=0.D0
			do i=1,n	! row
				R(1:i,i)=A(1:i,i)
			enddo
			call dtrmm('Left','Upper','Transpose','Not diagonal',n,n,1.D0,A,m,R,n)
			! in R is not R^T*R

!        Determine which columns of R to use
         DO 20 K = 1, N
            IF (ABS(A(K,K)).LE.TOL*ABS(A(1,1))) GO TO 40
   20    CONTINUE

!        Compute C = (Q**H)*B, storing the result in B
40 K = K - 1
         CALL DORMQR('Left','Transpose',M,NRHS,N,A,LDA,TAU,B,LDB,WORK,LWORK,INFO)
			if (info/=0) then
				return
			endif

!        Compute least-squares solution by backsubstitution in R*B = C
          CALL DTRSV('Upper','No transpose','Non-Unit',K,A,LDA,B(1,1),1)

!           Set the unused elements of the I-th solution vector to zero
				B(K+1:N,1)=ZERO
   60    CONTINUE
!        Unscramble the least-squares solution stored in B
         DO 100 I = 1, N
               X(JPVT(I),1) = B(I,1)
100		CONTINUE

! now estimate errors

		  ! we need inverse of R
			
		  ! DGETRF computes an LU factorization of a general M-by-N matrix A
		  ! using partial pivoting with row interchanges.
		  call DGETRF(n, n, R, n, ipiv, info)
		  if (info /= 0) then
			  !stop 'Matrix is numerically singular!'
			  return
		  end if

		  ! DGETRI computes the inverse of a matrix using the LU factorization
		  ! computed by DGETRF.
		  call DGETRI(n, R, n, ipiv, work, n, info)
			if (info/=0) then
				return
			endif
			! in R is now (R^T*R)^(-1)
			
			! evaluate A*X
			call DGEMM('None','None',m,1,n,1.D0,Acopy,m,X,n,0.D0,Bout,m)
			! calculate error of fit
			Bout=Bout-Bcopy
			err=dot_product(Bout,Bout)
			sigma=dsqrt(err/(m-n))
!        Unscramble the R
         DO I = 1, N
				Xerr(JPVT(I),1) = DSQRT(R(I,I))*sigma
			enddo
	return
end subroutine