Are 1D arrays faster than 2D arrays ?

inkant · ‎04-21-2011

Dear Users,

Compiler : Ifort -Version 12.0.2

System : RHEL v6, x86_64 , Intel Quad core Xeon

I just changed a code which used array of shape A(i) to use A(i,j) in one of the most calculation intensive part.

I noticed this performance degradation (using in built time command of unix) -

With a 1 dimension array

1st run

real 0m1.425s

user 0m1.373s

sys 0m0.006s

2nd run

real 0m1.393s

user 0m1.385s

sys 0m0.008s

With a 2 dimension array

1st run

real 0m2.649s

user 0m2.642s

sys 0m0.008s

2nd run

real 0m2.648s

user 0m2.640s

sys 0m0.007s

Is this a performance degradation by using a 2d array (because of probably array indexing)?

Best Regards,

Inkant

jimdempseyatthecove · ‎04-21-2011

In Fortran, a 2D array has best memory access when placing the left most array index in the inner loop (with C/C++ it is the other way around).

do J=1,nJ
do I=1,nI
A(I,J) = B(I,J) ...
...
end do
end do

You may need to look at and rework your loops.
For 3D, A(I,J,K)make K the outer most loop, J the middle loop, I the inner loop.

Jim Dempsey

inkant · ‎04-21-2011

Yes Jim,

The 2D array indices were varied according to the way you suggested.

In addition,

The 1D array subroutine had few conditionals to be evaluated, but 2D array was free of conditionals(which made it surprising to me to see a performance degradation).

The difference in the two subroutines was only that with 1D array, there was only one loop (with conditionals), but 2D array had two embedded loops.

Inkant

jimdempseyatthecove · ‎04-21-2011

Can you post the code?

In a nested loop, the compiler usually can usually registerize the outer loop base index to the array. However, in Debug mode this would not necessarily be the case (especially with index out of bounds checking if enabled).

Also are you passing the double subscripted arrays as arguments to a subroutine/function? If so, then how you declare the arguments with/without interface can affect performance.

Jim Dempsey

inkant · ‎04-22-2011

Jim,

The repeatability of the time is not good. I am trying to figure out why, after which I will post the code.

Inkant

Ron_Green · ‎04-22-2011

repeatability: what are you doing about linux lazy page allocation, or "demand paged" or "first time paging effects"? In other words, you DO touch each element before starting a timing loop, yes? OR you run enough interations to cancel the first time effects? If this sounds foreign to you, google "demand paging" or "lazy page".

but then you have to worry about things like vector intrinsics library routines being substituted for initialization or simple element movement - did you add option -nolib-inline to keep the compiler from replacing your code with a library equivalent? And if you have manually coded a matrix multiply, in 12.0 the -opt-matmul may replace your code with an MKL library call.

What have you done to align the data on 16 byte boundaries? You are using ALLOCATEable data, yes?

Are your timers accurate or are you using cpu_time() or get_time_of_day or equivalent? And the code is running for a minute or more so you are not looking at clock jitter, yes?

you may search this forum for other array performance questions. Typically "is array syntax as fast as hand coded loops?", etc. (btw - the answer is "most of the time they are equivalent unless you are doing something silly.") These are very frustrating studies, as the complexity of optimizing compilers can be doing numerous manipulations that you would not anticipate. And toy examples often oversimplify a real application. I'd recommend working with a real application rather than trying to draw conclusions from overly simplifed loop structures. BUT if you have a real-world solver that you're trying to optimize, there are a number of us on the forum interested in studying it.