- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
where,
real*8cvtrkp(6,6,3,maxtap,maxsfc,maxrdr,maxobj)
maxtap=100
maxsfc=85
maxrdr=40
maxobj=2
do j=1,6
do k=1,6
covar(j,k,1)=cvtrkp(j,k,1,i,isfc,igss,iobj)
covar(j,k,2)=cvtrkp(j,k,2,i,isfc,igss,iobj)
covar(j,k,3)=cvtrkp(j,k,3,i,isfc,igss,iobj)
enddo
enddo
So I guess my question is, is there an inherent run-time penalty when accessing data from large arrays like this and storing it in local temp array?
Thanks
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Also, I don't follow "but you should check what it did, if you don't want to make it easy."
Sorry, please explain.
Thanks!
no, there was no good reason for me to do that.
Ok this makes perfect sense to me now. Never thought about, but it makes sense. Ran a little test program to confirm. I will now begin changing all loops.
Thanks!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The compiler can reorder loops, if it can determine that the reordering is safe and if you have specified an optimization level that requests such reordering.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
That's an amazing question!
My answer is based on my today'soptimization problemsbecause Iran intoperformance issues with a template based C++ codes.
An applicationneeds to do some processing witha large 2-D data set of floats declared locallyona stack.
Just for interest I changed the declaration to'static', that is, global and allocated only once, and there was a performance degradation.Itwas almosttwice slowerto calculatea Kroneker's product of two matrices.
Actually, I expected some performance gains but result was opposite!
In general, I would strongly recommend to test your applicationas better as possiblebut in my case I clearly had more problems (cache misses ).
Best regards,
Sergey
PS: An example of code is here and I bolded and underlined two lines of codes where I had some issues:
...
inline RTbool Kronecker( const TMatrixSet< T, iDataType > &rtMs )
{
if( TDataSet< T, iDataType >::m_ptData1D == RTnull ) // [ MxN ] * [ RxK ] = [ MRxNK ]
return ( RTbool )RTfalse;
if( TDataSet< T, iDataType >::m_ptData2D == RTnull )
return ( RTbool )RTfalse;
RTint iM = ( RTint )TDataSet< T, iDataType >::m_uiRows;
RTint iN = ( RTint )TDataSet< T, iDataType >::m_uiCols;
RTint iR = ( RTint )rtMs.m_uiRows;
RTint iK = ( RTint )rtMs.m_uiCols;
if( iM == 0 || iN == 0 || iR == 0 || iK == 0 )
return ( RTbool )RTfalse;
TMatrixSet< T, iDataType > tMsTmp;
tMsTmp.SetSize( ( iM * iR ), ( iN * iK ) );
tMsTmp.m_enMatrixTranspose = m_enMatrixTranspose;
RTint m,n,r,k;
RTint mr = 0;
RTint nk = 0;
for( m = 0; m < iM; m++ )
{
for( r = 0; r < iR; r++ )
{
nk = 0;
for( n = 0; n < iN; n++ )
{
for( k = 0; k < iK; k++ )
{
tMsTmp.m_ptData2D[mr][nk] = TDataSet< T, iDataType >::m_ptData2D
rtMs.m_ptData2D
nk++;
}
}
mr++;
}
}
*this = tMsTmp;
return ( RTbool )RTtrue;
};
...
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page