- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Hi,

I am setting up several weighted normal matrices as A'PA, where A is the first design matrix of size nObs x nPar and P is the weight matrix of size nObs x nObs . A'PA is then a symmetric matrix of size nPar x nPar. I perform this operation using the DSYMM + DGEMM routines.

! Memory allocation IF (ASSOCIATED(AA)) DEALLOCATE(AA) ALLOCATE(AA(NOBS,NPAR),stat=ii) IF (ASSOCIATED(PA)) DEALLOCATE(PA) ALLOCATE(PA(NOBS,NPAR),stat=ii) ! I actually just need the U or L triangular part of it (symmetric) IF (ASSOCIATED(ATPA)) DEALLOCATE(ATPA) ALLOCATE(ATPA(NPAR,NPAR),stat=ii) ! Set up of PA CALL dsymm('L', 'U', nObs, nPar, 1.d0, P_f, nObs, AA, nObs, 0.d0, PA, nObs) ! Setup of A'PA CALL dgemm('T','N',nPar,nPar,nObs,1.d0,AA,nObs,PA,nObs,1.d0,ATPA,nPar) ! Deallocation of AA, PA IF (ASSOCIATED(AA)) DEALLOCATE(AA) IF (ASSOCIATED(PA)) DEALLOCATE(PA)

Now, in my case nPar is as large as 90000 (or more), so that I need to allocate a very large amount of memory for the output (A'PA). In principle, I just need the upper or lower triangular matrix (since it's symmetric) but I cannot find a way to avoid the simultaneous allocation of the 90000x90000 matrix (needed as output by DGEMM) and of the triangular matrix (which is approximately half the size) where I would copy the part I am interested in before deallocating the full one.

Do you have any suggestion or see any option to compute this product using optimized parallel routines without allocating the full A'PA matrix? I checked all routines or packed formats but I cannot find a viable way out of allocating the full matrix at some point.

Thanks!

ps : I have an alternative using the intrinsic MATMUL function and small batches of 5 observations that are then added together in a triangular matrix (allocated and accessed as a vector). Unfortunately, as you can imagine, this is hardly very efficient.

Link Copied

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

1. Is the weights matrix P a diagonal matrix?

2. What do you intend to do with A^{T}PA, once you have assembled it?

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

It would be more economical (memory and running time) and numerically stable to

- compute the QR decomposition of A
- compute R
_{s}= R.sqrt(P)

Having done this, note that A^{T}PA = (QR_{s})^{T}(QR_{s}), and that Q is orthogonal.

Please note that computational routines in MKL/Lapack may return Q in a different representation than as a dense matrix (e.g., as a product of Householder reflections), but Lapack provides helper routines to work with such Q matrices.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Let us say that I will go for the factorisation and then use a standard MATMUL to compute only the part of A'PA that I need. How do I proceed?

I think I should :

1. use **GEQP3 **to perform the factorisation

2. how do I get the R matrix out of the output to compute R.sqrt(P) ? Manually?

3. how do you suggest to recompute QRs ?

Thanks!

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Ok, in the end I went for an "easy" even if a bit less efficient solution (I report it here in case it shall be of help for someone). I used DGEMV in a loop to get the elements of my normal matrix : it il indeed less efficient but still I gain a huge time factor by parallelization and I use less than 1/3 of the memory. I'm happy enough with it. ;)

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page