Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

Slow run-time

wolfpackNC
Beginner
328 Views
I was running into a really slow run-time issue that popped up recently. I have this subroutine that gets called a lot, and had added this do loop recently. Before I was not pulling data from this large array cvtrkp(j,k,1,i,isfc,igss,iobj)

where,

real*8cvtrkp(6,6,3,maxtap,maxsfc,maxrdr,maxobj)

maxtap=100

maxsfc=85

maxrdr=40

maxobj=2


do j=1,6
do k=1,6
covar(j,k,1)=cvtrkp(j,k,1,i,isfc,igss,iobj)
covar(j,k,2)=cvtrkp(j,k,2,i,isfc,igss,iobj)
covar(j,k,3)=cvtrkp(j,k,3,i,isfc,igss,iobj)
enddo
enddo



So I guess my question is, is there an inherent run-time penalty when accessing data from large arrays like this and storing it in local temp array?

Thanks
0 Kudos
5 Replies
TimP
Honored Contributor III
328 Views
Any reason for nesting the loops backwards? With the right options, the compiler may take care of that, but you should check what it did, if you don't want to make it easy.
0 Kudos
wolfpackNC
Beginner
328 Views
Wait. Sorry, what do you mean nesting backwards? So it's prefered to have the inner most loop on the far left index. That makes sense in terms of fortran ordering on memory.

Also, I don't follow "but you should check what it did, if you don't want to make it easy."

Sorry, please explain.

Thanks!

no, there was no good reason for me to do that.
Ok this makes perfect sense to me now. Never thought about, but it makes sense. Ran a little test program to confirm. I will now begin changing all loops.

Thanks!
0 Kudos
mecej4
Honored Contributor III
328 Views
The ideal nesting of nested DO loops has the first index (in left-to-right order) of a multi-dimensional array varying in the innermost loop, the second index in the second innermost loop, etc.

The compiler can reorder loops, if it can determine that the reordering is safe and if you have specified an optimization level that requests such reordering.
0 Kudos
wolfpackNC
Beginner
328 Views
Thanks everyone for the quick response! This helps a lot!
0 Kudos
SergeyKostrov
Valued Contributor II
328 Views
...is there an inherent run-time penalty when accessing data from large arrays like this and storing it in local temp array...

That's an amazing question!

My answer is based on my today'soptimization problemsbecause Iran intoperformance issues with a template based C++ codes.

An applicationneeds to do some processing witha large 2-D data set of floats declared locallyona stack.

Just for interest I changed the declaration to'static', that is, global and allocated only once, and there was a performance degradation.Itwas almosttwice slowerto calculatea Kroneker's product of two matrices.

Actually, I expected some performance gains but result was opposite!

In general, I would strongly recommend to test your applicationas better as possiblebut in my case I clearly had more problems (cache misses ).

Best regards,
Sergey

PS: An example of code is here and I bolded and underlined two lines of codes where I had some issues:

...

inline RTbool Kronecker( const TMatrixSet< T, iDataType > &rtMs )
{
if( TDataSet< T, iDataType >::m_ptData1D == RTnull ) // [ MxN ] * [ RxK ] = [ MRxNK ]
return ( RTbool )RTfalse;

if( TDataSet< T, iDataType >::m_ptData2D == RTnull )
return ( RTbool )RTfalse;

RTint iM = ( RTint )TDataSet< T, iDataType >::m_uiRows;
RTint iN = ( RTint )TDataSet< T, iDataType >::m_uiCols;
RTint iR = ( RTint )rtMs.m_uiRows;
RTint iK = ( RTint )rtMs.m_uiCols;

if( iM == 0 || iN == 0 || iR == 0 || iK == 0 )
return ( RTbool )RTfalse;

TMatrixSet< T, iDataType > tMsTmp;
tMsTmp.SetSize( ( iM * iR ), ( iN * iK ) );

tMsTmp.m_enMatrixTranspose = m_enMatrixTranspose;

RTint m,n,r,k;
RTint mr = 0;
RTint nk = 0;

for( m = 0; m < iM; m++ )
{
for( r = 0; r < iR; r++ )
{
nk = 0;
for( n = 0; n < iN; n++ )
{
for( k = 0; k < iK; k++ )
{
tMsTmp.m_ptData2D[mr][nk] = TDataSet< T, iDataType >::m_ptData2D *
rtMs.m_ptData2D;
nk++;
}
}
mr++;
}
}

*this = tMsTmp;

return ( RTbool )RTtrue;
};
...

0 Kudos
Reply