- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am trying to implement some Fortran code from someone else as follows:
do i=1,n
do j=1,i
S = A(i,j) - G(j,j:n) * G(i,j:n)'
enddo
enddo
I believe it is meant to be equivalent to the following:
do i=1,n
do j=1,i
S = A(i,j)
do k=j,n
S = S - G(j,k) * G(i,k)
enddo
enddo
enddo
The first code sample results in a compiler error because it doesn't like the ' at the end of line 3 which I believe is meant to tell it to accept S as a scalar. If I take the ' away then I get a different compiler error saying that S is dimensioned incorrectly, presumably because it expects it to be an array.
Does anyone know how to get around this and what the equivalent of the ' at the end of line 3 is in Intel Fortran?
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
There should not be a ' at the end of line three it is not valid syntax, remove it.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes, but then I get a compiler error because it is expecting S to be dimensioned as an array.
What I really want to know is what to replace the ' with so that line 3 sums the G(j,j:n) * G(i,j:n) terms and puts them into S as a scalar. I'm pretty sure that's what the ' is there for but I don't know what the Intel equivalent is.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I misread slightly.
G(j,j:n) and G(i,j:n) are both arrays that are 1 x (n-j+1) big so the result of the multiply is a 1d array . You are trying to assign and array to a scaler.
I think you really want SUM( G(j,j:n) * G(i,j:n)) which adds the array elements of the multiplication result to give a single scaler.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
schulzey wrote:
Yes, but then I get a compiler error because it is expecting S to be dimensioned as an array.
What I really want to know is what to replace the ' with so that line 3 sums the G(j,j:n) * G(i,j:n) terms and puts them into S as a scalar. I'm pretty sure that's what the ' is there for but I don't know what the Intel equivalent is.
The prime at the end makes sense in a mathematical way, in the context of computing the scalar product of two row vectors taken from matrices. It makes no sense in Fortran, since the quote character is used to delimit character strings.
It strikes me that someone may have gone too far in paring down the first version of the code extract before posting here. The nested loops do nothing useful, since only the last-computed value of S is available after the loops are completed, so the entire code extract is equivalent to
[fortran]S = A(n,n) - G(n,n) * G(n,n)[/fortran]
if the value of S is not assigned to something else with subscripts inside the loops.
If, however, the lines in question were comments in Fortran code, those would be understandable as mathematical explanations of subsequent lines of Fortran code with DO loops.
In your second code extract, the loop with index k could probably be replaced by either A(i,j) = A(i,j) - sum(G(j,j:n) * G(i,j:n)) or A(i,j) = A(i,j) - DOT_PRODUCT(G(j,j:n) , G(i,j:n))
On the other hand, the code suggests that some sort of matrix orthogonalization is intended, in which case there are routines in Lapack (or MKL) that will serve you better.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks guys, SUM(G(j,j:n) * G(i,j:n)) and DOT_PRODUCT(G(j,j:n) , G(i,j:n)) both do what I want. I am after maximum speed and so I will try some speed tests to see which is faster (DOT_PRODUCT I suspect). mecej4, I will also look into Lapack and MKL to see if they are faster.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>>SUM(G(j,j:n) * G(i,j:n)) and DOT_PRODUCT(G(j,j:n) , G(i,j:n)) both do what I want. I am after maximum speed
Without seeing all your code it is impossible for me, or anyone else not seeing all your code, to suggest improvements.
However, in abstract terms, if G were kept in transposed form then
DOT_PRODUCT(G(j:n,i) , G(j:n),i) ! Transposed faster
DOT_PRODUCT(G(j,j:n) , G(i,j:n))
Note, I am not suggesting you immediately transpose G prior to the above statement. Rather, I am suggesting that you investigate as to if making (new) G the transposition of (old) G. This may require significant code change.
When the code you are using had a heritage from decades ago, the computers then may not have had cache nor SIMD instructions. On "ancient" systems without cache, the order of subscripts made no difference in performance. On any system with cache memory, memory order makes a large difference (on the order of 5x-100x). Add vectorization and you might attain an additional 2x-10x.
In Fortran G(i,j) and G(i+1,j) are in adjacent memory, and have a high probability of being in the same cache line. Cache lines are typically 64 bytes (16 floats, 8 doubles). *** Note, Fortran fastest (nearest) references go left to right, C/C++ are right to left.
Look through the rest of your code to see how much of it is indexing subscripts from right to left. In some cases you can simply change the loop order, in other cases like the above, it may (may stressed) make a significant difference to organize the data in the transposed format.(swap rows with columns).
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
WOW, simply working with the transpose of G sped it up by around 16x on my Intel i7 laptop !!! Note that A, G and S are all doubles.
I have always known that 2D Fortran arrays are arranged in memory from left to right, but I didn't realise that having the left index in the innermost loop would have such a big speed benefit !
By the way, using an inner loop with k=1,j-1 rather than using DOT_PRODUCT (shown below) or SUM doesn't seem to make any difference in speed.
My application is a simple Cholesky factorization of a symmetric positive definite matrix of size n x n. The aim is to find a lower triangular matrix
do i=1,n
do j=1,i
S=A(j,i)-DOT_PRODUCT(G(1:j-1,j),G(1:j-1,i)) ! Changed from ...(G(j,1:j-1),G(i,1:j-1))
if (j.lt.i) then
G(j,i)=S/G(j,j) ! Changed from G(i,j)=S/G(j,j)
else
G(i,i)=SQRT(S)
endif
enddo
enddo
Can you see any other ways I could speed it up further? How do I add vectorization?
Peter Schulze
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The routines dpotrf, etc., in the MKL library (which is included with the Intel compilers) performs the GGT factorization of a positive definite matrix. Try calling the appropriate MKL routine instead of writing your own, if speed is your goal.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page