Two dimensional array with variable size of columns

Yavuz_D_ · ‎04-30-2017

Dear All,

I want to inquire the possibility of creating a two dimensional array in Fortran with variable size of columns which is possible in other languages.

Do we have alternative solution to using a derived type for the purpose? For example,

    type VarSizedArray
        integer, allocatable :: col(:)
    end type VarSizedArray
    type(VarSizedArray),allocatable :: row(:)
    
    allocate(row(2))
    allocate(row(1).col(1))
    allocate(row(2).col(2))
    row(1).col(1) = 11
    row(2).col(1) = 21 
    row(2).col(2) = 22
    print*, row(1)
    print*, row(2)

produces;

11
21 22

Arjen_Markus · ‎05-01-2017

The simple answer is No, this is the way to do it.

Is there a particular reason you are not satisfied with this solution?

Yavuz_D_ · ‎05-01-2017

Thank you very much Arjen.

I just did not want to miss the chance of using a shorter way. Sometimes performance is comparable in the way you implement some parts of the code. I always have the notion that the more primitive a data type is, the better is the performance. (might be a false belief) And, final reason is that alternative solutions come up in time by the people I work with, I am surprised being not aware since because I had made my way with a solution I think that could be the only way.

Roman1 · ‎05-01-2017

In your example, you are using an array of arrays. Perhaps a better and more efficient way of doing this would be to use only 2 arrays. The first one stores all the values, and the second one indicates where the values for each row start.

program test2d
implicit none

integer,allocatable:: val(:)
integer,allocatable:: rowptr(:)
integer nnz, nrow, i, j1,j2

nrow = 2  ! total number of rows
nnz = 3  ! total number of values for all rows
allocate( val(nnz), rowptr(nrow+1) )

val = [ 11,  21, 22 ]
rowptr = [ 1, 2, 4 ]

do i = 1, nrow
   j1 = rowptr(i)
   j2 = rowptr(i+1) - 1
   print*, val( j1 : j2 )
end do

stop
end program test2d

Yavuz_D_ · ‎05-01-2017

Thank you Roman. I can't free up my mind from the physics of the problem when implementing it in a program. This reveals the fact for me that every problem needs further elaboration for internal storage and computational performance after the definition of the physics.

Your suggestion opens up some other possibilities for my problem. I will share some performance results as soon as I find time to prepare a preliminary implementation.

Arjen_Markus · ‎05-01-2017

Another alternative is to use an ordinary two-dimensional array where the second dimension is equal to the maximum you need and use a parallel array holding the second dimension per row. Something like (inspired by Roman's example):

program test2db
	implicit none
	 
	integer,allocatable:: val(:,:)
	integer,allocatable:: rowlen(:)
	integer maxrowlength, nrow, i, jmax
	 
	nrow = 2  ! total number of rows
	maxrowlength = 2  ! maximum length of the row
	allocate( val(nrow,maxrowlength), rowlen(nrow) )
	val = reshape( [ 11,  21, 22, 0 ], [nrow, maxrowlength] )
	rowlen = [ 2, 1 ]
	 
	do i = 1, nrow
	   jmax = rowlen(i)
	   print*, val( i, 1 : jmax )
	end do
	 
	stop
end program test2db

You waste some memory of course, but addressing the rows is slightly simpler. It might be faster as well, as the section of the array you need is more predictable. But whether that is an advantage in practice is an aspect that needs to be measured, not divined.

jimdempseyatthecove · ‎05-02-2017

There may or may not be some issues with both Roman's and Arjen's suggestions with large datasets.

Arjen is using a rectangular array to hold sparse data. This may induce memory capacity issues for very large datasets. Roman's method is not subject to this (over allocation) issue.

Both solutions require knowing the entire array size prior to allocation whereas in the original proposal (#1) you only have to determine the next row size. Prior to selecting method, considerations regarding memory capacity and initial allocation need to be made.

Both methods can be aided by helper functions that return element reference and row reference. Column reference could return an array of integers holding the linear indexes in Roman's suggestion, but for sparse arrays you would need to specify value to use for missing elements (0.0, NaN, -n, ...)

Jim Dempsey

Yavuz_D_ · ‎05-02-2017

Arjen, Jim thanks for your contributions.

Arjen, I used your suggestion on a 2D solver once in a while. I liked it since it was close to physical definition of the problem. In exchange of avoiding some functions or some further thoughts on the computer implementation, I struggled a lot with indices with the problem physics in mind.

I am going to stick to Roman's suggestion and decided to order data that helps me avoid second dimension for most of the solutions.

Jim, I think it does not matter to know the size of the data before hand, it is always possible to insert or append.

And also if the data is ordered, original typed solution has a better pointer alternative which will not be better that Roman's suggestion that reduces into 1D array.