Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

Syntax question

rafadix08
Beginner
1,295 Views
Hi,

I have a dataarray (called Data) with big first dimension (about 500,000) and not so big second dimension (about 20).

I want to create a second data array (Data2) from Data in the most efficient way possible, since I create several derived arrays in the same way. What is the most efficient way to do it?

Data2 is a subarray of Data. In order to make it to Data2, a line of Data should satisfy some conditions. The size of Data2 should be equal to number of lines of Data that satisfy those assumptions.

The code below is taking longer than I would like it to:

! Determine number of lines of Data that obey some
! Condition
SizeData2 = 0
do i = 1, N
if (Data(i,1) == 1) then
SizeData2 = SizeData2 + 1
end if
end do


allocate(Data2(SizeData2,dim1))


! Assign values to Data2
index = 1
do i = 1, N
if (Data(i,1) == 1) then
Data2(index,:) = Data(i,:)
index = index + 1
end if
end do

Many thanks,
Rafael
0 Kudos
5 Replies
Greg_T_
Valued Contributor I
1,295 Views
Hi Rafael,

In general, it seems to me that the trade-off in an algorithm is often between efficient speed or efficient memory use. I think the given code is seeking efficient memory use by making the Data2 array the smallest possible size, which I think is often a good approach.

One possibility to change the algorithm could be to have less efficient memory use to perhaps speed up the algorithm. For example, do you have other information about the possible size of the Data2 array? Is is always smaller than some fraction of the Data array, like half or a tenth as large? Or could Data2 be just as large as Data? If there is some smaller but maximum size for Data2, perhaps its size could be set to that maximum. Then that would avoid the first loop to determine the SizeData2 value. I think this could be considered as a trade-off to make Data2 larger than the minimum, but by avoiding the first loop, perhaps that would speed up the algorithm. Would less efficient memory for Data2 be acceptable?

Another possibility is to use a data struture to access the subset of values for Data2, but not actually make a copy of Data2. One way could be an integer array that contains a list of the row numbers for the Data2 values within the larger Data array. For example, allocate an integer array, I'll call it "int_data2"; its size could use the SizeData2 loop to get the size, or allocate a larger size like the paragraph above. Then loop through the Data array to find the values you want for Data2, but instead of copying that row of Data to Data2, just save that row value to the int_data2 array. Then later in the program when doing operations on the Data2 values, you would get the particular row index to access Data from row=int_data2(j), where j is the index for the number of values in Data2. This may require changes elsewhere in the program to access the subarray Data2 values directly within the original Data array. This is a basic type of a linked list, and there are certainly many other approaches for linked lists. Would this approach be possible?

Regards,
Greg

0 Kudos
Les_Neilson
Valued Contributor II
1,295 Views

Well the first loop can be replaced by

SizeData2 = count(Data(:,1)==1)

Les
0 Kudos
jimdempseyatthecove
Honored Contributor III
1,295 Views

Another alternative is if Data does not change while using Data2 then Data2 can be an array of pointers to rows in Data. Pointers after last valid row will not be associated. Data2 is allocated onceto the total number of rows+1 in Data. then

! Assignpointers in Data2 to rows in Data
index = 1
do i = 1, N
if (Data(i,1) == 1) then
Data2(index) => Data(i,:)
index = index + 1
end if
end do
NULLIFY(Data2(index))

You can omit the NULLIFY if you remember the number of valid pointers (SizeData2)

! Assignpointers in Data2 to rows in Data
SizeData2 = 0
do i = 1, N
if (Data(i,1) == 1) then
SizeData2 = SizeData2 + 1
Data2(SizeData2 ) => Data(i,:)
end if
end do

Greg's method of array of indexes may be faster code, the pointer method may be more generic.

Jim Dempsey
0 Kudos
rafadix08
Beginner
1,295 Views
Thank you all very much for the nice answers.
The only problem is that I use a subroutine that must be fed with both Data2 and its size, so in order to implement your solutions I would have to substantially alter this and other subroutines.
What I had in mind when I posted the question was whether Fortran had some built in constructs in the spirit of "WHERE", where I would feed in an array, impose some conditions and then it would return me a sub-array meeting those conditions.
Many thanks for your time.

0 Kudos
jimdempseyatthecove
Honored Contributor III
1,295 Views

Rafael,

There is a second efficiency problem with your datalayout
Memory order in Fortranis Data(1,1), Data(2,1), Data(3,1), ...

From the loops in you code snips it appears that you are indexing the data as if the indexing scheme were C/C++ (in C/C++ right most index varies fastest/adjacent data, in Fortran left most index varies fastest/adjacent data).

If you want fast code you will have to make some changes. As layed out your code manipulatingdata in these arrays will be significantly slower than organize the other way.

Jim Dempsey


0 Kudos
Reply