- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I have a dataarray (called Data) with big first dimension (about 500,000) and not so big second dimension (about 20).
I want to create a second data array (Data2) from Data in the most efficient way possible, since I create several derived arrays in the same way. What is the most efficient way to do it?
Data2 is a subarray of Data. In order to make it to Data2, a line of Data should satisfy some conditions. The size of Data2 should be equal to number of lines of Data that satisfy those assumptions.
The code below is taking longer than I would like it to:
! Determine number of lines of Data that obey some
! Condition
SizeData2 = 0
do i = 1, N
if (Data(i,1) == 1) then
SizeData2 = SizeData2 + 1
end if
end do
allocate(Data2(SizeData2,dim1))
! Assign values to Data2
index = 1
do i = 1, N
if (Data(i,1) == 1) then
Data2(index,:) = Data(i,:)
index = index + 1
end if
end do
Many thanks,
Rafael
I have a dataarray (called Data) with big first dimension (about 500,000) and not so big second dimension (about 20).
I want to create a second data array (Data2) from Data in the most efficient way possible, since I create several derived arrays in the same way. What is the most efficient way to do it?
Data2 is a subarray of Data. In order to make it to Data2, a line of Data should satisfy some conditions. The size of Data2 should be equal to number of lines of Data that satisfy those assumptions.
The code below is taking longer than I would like it to:
! Determine number of lines of Data that obey some
! Condition
SizeData2 = 0
do i = 1, N
if (Data(i,1) == 1) then
SizeData2 = SizeData2 + 1
end if
end do
allocate(Data2(SizeData2,dim1))
! Assign values to Data2
index = 1
do i = 1, N
if (Data(i,1) == 1) then
Data2(index,:) = Data(i,:)
index = index + 1
end if
end do
Many thanks,
Rafael
Link Copied
5 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Rafael,
In general, it seems to me that the trade-off in an algorithm is often between efficient speed or efficient memory use. I think the given code is seeking efficient memory use by making the Data2 array the smallest possible size, which I think is often a good approach.
One possibility to change the algorithm could be to have less efficient memory use to perhaps speed up the algorithm. For example, do you have other information about the possible size of the Data2 array? Is is always smaller than some fraction of the Data array, like half or a tenth as large? Or could Data2 be just as large as Data? If there is some smaller but maximum size for Data2, perhaps its size could be set to that maximum. Then that would avoid the first loop to determine the SizeData2 value. I think this could be considered as a trade-off to make Data2 larger than the minimum, but by avoiding the first loop, perhaps that would speed up the algorithm. Would less efficient memory for Data2 be acceptable?
Another possibility is to use a data struture to access the subset of values for Data2, but not actually make a copy of Data2. One way could be an integer array that contains a list of the row numbers for the Data2 values within the larger Data array. For example, allocate an integer array, I'll call it "int_data2"; its size could use the SizeData2 loop to get the size, or allocate a larger size like the paragraph above. Then loop through the Data array to find the values you want for Data2, but instead of copying that row of Data to Data2, just save that row value to the int_data2 array. Then later in the program when doing operations on the Data2 values, you would get the particular row index to access Data from row=int_data2(j), where j is the index for the number of values in Data2. This may require changes elsewhere in the program to access the subarray Data2 values directly within the original Data array. This is a basic type of a linked list, and there are certainly many other approaches for linked lists. Would this approach be possible?
Regards,
Greg
In general, it seems to me that the trade-off in an algorithm is often between efficient speed or efficient memory use. I think the given code is seeking efficient memory use by making the Data2 array the smallest possible size, which I think is often a good approach.
One possibility to change the algorithm could be to have less efficient memory use to perhaps speed up the algorithm. For example, do you have other information about the possible size of the Data2 array? Is is always smaller than some fraction of the Data array, like half or a tenth as large? Or could Data2 be just as large as Data? If there is some smaller but maximum size for Data2, perhaps its size could be set to that maximum. Then that would avoid the first loop to determine the SizeData2 value. I think this could be considered as a trade-off to make Data2 larger than the minimum, but by avoiding the first loop, perhaps that would speed up the algorithm. Would less efficient memory for Data2 be acceptable?
Another possibility is to use a data struture to access the subset of values for Data2, but not actually make a copy of Data2. One way could be an integer array that contains a list of the row numbers for the Data2 values within the larger Data array. For example, allocate an integer array, I'll call it "int_data2"; its size could use the SizeData2 loop to get the size, or allocate a larger size like the paragraph above. Then loop through the Data array to find the values you want for Data2, but instead of copying that row of Data to Data2, just save that row value to the int_data2 array. Then later in the program when doing operations on the Data2 values, you would get the particular row index to access Data from row=int_data2(j), where j is the index for the number of values in Data2. This may require changes elsewhere in the program to access the subarray Data2 values directly within the original Data array. This is a basic type of a linked list, and there are certainly many other approaches for linked lists. Would this approach be possible?
Regards,
Greg
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Well the first loop can be replaced by
SizeData2 = count(Data(:,1)==1)
Les
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Another alternative is if Data does not change while using Data2 then Data2 can be an array of pointers to rows in Data. Pointers after last valid row will not be associated. Data2 is allocated onceto the total number of rows+1 in Data. then
! Assignpointers in Data2 to rows in Data
index = 1
do i = 1, N
if (Data(i,1) == 1) then
Data2(index) => Data(i,:)
index = index + 1
end if
end do
NULLIFY(Data2(index))
You can omit the NULLIFY if you remember the number of valid pointers (SizeData2)
! Assignpointers in Data2 to rows in Data
SizeData2 = 0
do i = 1, N
if (Data(i,1) == 1) then
SizeData2 = SizeData2 + 1
Data2(SizeData2 ) => Data(i,:)
end if
end do
Greg's method of array of indexes may be faster code, the pointer method may be more generic.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you all very much for the nice answers.
The only problem is that I use a subroutine that must be fed with both Data2 and its size, so in order to implement your solutions I would have to substantially alter this and other subroutines.
What I had in mind when I posted the question was whether Fortran had some built in constructs in the spirit of "WHERE", where I would feed in an array, impose some conditions and then it would return me a sub-array meeting those conditions.
Many thanks for your time.
The only problem is that I use a subroutine that must be fed with both Data2 and its size, so in order to implement your solutions I would have to substantially alter this and other subroutines.
What I had in mind when I posted the question was whether Fortran had some built in constructs in the spirit of "WHERE", where I would feed in an array, impose some conditions and then it would return me a sub-array meeting those conditions.
Many thanks for your time.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Rafael,
There is a second efficiency problem with your datalayout
Memory order in Fortranis Data(1,1), Data(2,1), Data(3,1), ...
From the loops in you code snips it appears that you are indexing the data as if the indexing scheme were C/C++ (in C/C++ right most index varies fastest/adjacent data, in Fortran left most index varies fastest/adjacent data).
If you want fast code you will have to make some changes. As layed out your code manipulatingdata in these arrays will be significantly slower than organize the other way.
Jim Dempsey
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page