Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

Multi-Dimensional Allocatable Array

Benedikt_R_
Beginner
846 Views

Dear group

I have a Fortran-question on Multi-Dimensional Allocatable Arrays

I'd like to declare an array of dynamic length, where each entry is itself an array of length 2. It should look like

      REAL, Allocatable:: XY(2,:)

Unfortunately: If I do so, I get an error

     error #6644: The dimension specifications are incompatible.   [XY]

I could solve this by declaring both dimensions dynamic:

      REAL, Allocatable:: XY(:,:)

But if I do so, I expect the compiler to generate less efficient code, since the compiler does know less about the structure of my array at compile time.

In particular: The expression "XY(1,i)" should generate code like "Multiply i with 2*sizeof(REAL)" which is an efficient power-of-2-multiplication.

Any advice?

Thank you

Benedikt

0 Kudos
10 Replies
Steve_Lionel
Honored Contributor III
846 Views

Fortran doesn't have the feature you want. A deferred-shape array must have all of its bounds deferred, you can't pick and choose. Why not make it a single-dimensioned array of a two-component derived type?

0 Kudos
Benedikt_R_
Beginner
846 Views

Thanks for the clearification.

> Why not make it a single-dimensioned array of a two-component derived type?

I have to refactor some code - this refactoring seemed to be easier with the other approach. I will use array of a two-component type.

Benedikt

0 Kudos
jimdempseyatthecove
Honored Contributor III
846 Views

>> I will use array of a two-component type.

This may require extensive modification of your program

XY(i,j) becomes XY(j)%v(i)

I suggest you allocate XY(:,:), but then code your program as

allocate XY(2,N)
...
call foo(YY,N) ! array XY, extent
...
subroutine foo(QQ, extentQQ)
real :: QQ(2, extentQQ)
integer :: extentQQ
..
do i=1, extentQQ
    QQ(1,i) = ...
    QQ(2,i) = ...
...

In this manner you have all the benefits of any potential compiler hints you can offer.
And more importantly, fewer changes to your code.

Jim Dempsey

0 Kudos
Benedikt_R_
Beginner
846 Views

> This may require extensive modification of your program

You are right. But I wrote some Python-code to do this rewriting. So I'm be fine with it.

Does anybody have an opinion, whether my expectation "'2. Option Type' generates faster code since the compiler knows about the size of the type" is true?

1. Option Array:

REAL, Allocatable:: XY(:,:)
allocate XY(2,N)
XY(1,I) = 123

2. Option Type:

type Coord
  REAL X
  REAL Y
end type Coord
Coord, Allocatable:: XY(:)
XY(I)%x = 123

 

 

0 Kudos
Steve_Lionel
Honored Contributor III
846 Views

The more the compiler can see at compile time, the better, though the two cases you show should generate very similar code since the offset of memory location is known either way. The first case does require the multiplication for the index to look in the descriptor for the extent size, but I doubt that is measurable.

0 Kudos
jimdempseyatthecove
Honored Contributor III
846 Views

I assume option 2, line 5 has a typo (:,:) should be (:)

Option 1) Steve is correct about the multiplication, I might add though:

When the compilation unit does not see (know) the first index is 2, then the generated code will have to fetch the size of the first dimension from the array descriptor (as what Steve says), and added: if the register pressure of your code is small then the extent size will likely remain in a register, and in which case your performance difference would be negligible.

Option 2) because the type for X and Y in type Coord are REAL, thus the sizeof(Coord) is 8. The instruction set of the CPU has a prefix called SIB (Scale Index Base). The Scale portion can perform a multiplication of 1, 2, 4, or 8. Due to Coord being sizeof 8, the compiler can eliminate the multiplication instruction. This can potentially be significant if the array is contained in L1 cache, less so if it is in L2 and lesser if it is in L3 and possibly negligible when data is in RAM. Also, the use of SIB to perform the multiplication can recover a GP register and by doing so may help performance too.

Note, using the DUMMY argument method, where the first dimension is explicitly stated as 2, will experience the benefit of the SIB multiplication (when Coord contains two REAL*4's)

Jim Dempsey

0 Kudos
Benedikt_R_
Beginner
846 Views

Many thanks to Steve and Jim. Great support!

In my real code the allocation and the usage of the array are in different functions - so I doubt the compiler can guess the size of the inner array.

I'll stick with the two-component type ...

0 Kudos
FortranFan
Honored Contributor II
846 Views

Benedikt R. wrote:

.. Does anybody have an opinion, whether my expectation "'2. Option Type' generates faster code since the compiler knows about the size of the type" is true? ..

@Benedikt R.,

Can you please provide some background/description of the computations you will perform with either Option Array/Type?  There is the possibility the generated code shows little difference in terms of being fast with the two options for some of the computational needs.  Why not then use the option that makes the code more readable and maintainable for the coder(s) working on it, both now as well as in the future?

0 Kudos
jimdempseyatthecove
Honored Contributor III
846 Views

>>Can you please provide some background/description of the computations you will perform with either Option Array/Type?

It is always necessary to look at the larger picture as opposed to the hottest inner most loop. Depending on the larger picture it may (or may not) be beneficial to separate X and Y into different arrays: X(i), Y(i) as opposed to XY(i)%X, XY(i)%Y

While the user defined type XY reduces the number of arguments on a CALL/function, that time is trivial compared to using the data from the array/arrays.

Jim Dempsey

0 Kudos
Steve_Lionel
Honored Contributor III
846 Views

I would be remiss if I didn't add my standard observation that guessing at micro-optimizations is a waste of effort. Write the code in a way that makes the most sense and is the most maintainable, and let the compiler worry about optimization. Run the program through a performance analyzer such as Intel VTune and see if there are problem spots. In most cases, your energy is best spent elsewhere in the program.

0 Kudos
Reply