Allocatable array of deferred-length character variables

a_leonard · ‎03-21-2012

I want to read the contents of a file into an array that stores each line for subsequent processing. In order to minimize memory, I am trying to use an allocatable array of deferred-length character variables. I get an internal compiler errror in the part of the code doing the processing. The work-around is fairly easy, but is there a better approach?

A stripped-down example follows.

implicit none

integer :: n=0, m=0, i, istat
character (len=512) :: string
character (len=:), allocatable, dimension(:) :: string_array

open(1,file='input.txt')

do
read(1,'(A)',end=99) string
n = MAX(n,LEN_TRIM(string))
m = m+1
enddo

99 continue

allocate(character(n)::string_array(m), stat=istat)
rewind(1)

do i=1,m
read(1,'(A)') string
string_array(i) = string
enddo

do i=1,m

! compiler error on this line
! if (string_array(i)(1:1) == '|') print *, string_array(i)

! workaround
string = string_array(i)
if (string(1:1) /= '|') print *, string_array(i)
enddo

stop
end

Steven_L_Intel1 · ‎03-21-2012

I assume you would really like each element of the string to be the actual length of the line, rather than all the maximum length. If so, you need a derived type to hold each string, like this:

[plain] implicit none integer :: n=0, m=0, i, istat character (len=512) :: string type string_array_type character (len=:), allocatable :: string end type string_array_type type(string_array_type), allocatable, dimension(:) :: string_array open(1,file='input.txt') do read(1,'(A)',end=99) string m = m+1 enddo 99 continue allocate(string_array(m), stat=istat) rewind(1) do i=1,m read(1,'(Q,A)') n,string string_array(i)%string = string(1:n) enddo do i=1,m ! compiler error on this line if (string_array(i)%string(1:1) == '|') print *, string_array(i)%string enddo stop end[/plain] This compiles. The internal compiler error is already fixed for a release later this year.

a_leonard · ‎03-21-2012

Thanks, Steve.

Yes, the derived type would be much better. I means I have to change all occurances of "string_array(...)" with "string_array(...)%string", but I think I can handle that.

I did have a problem compiling the code, though. I'm not sure what's wrong, since the variable is declared on line 8.

Compiling with Intel Visual Fortran Compiler XE 12.1.1.258 [IA-32]...
Source1.for

Source1.for(19): error #6404: This name does not have a type, and must have an explicit type. [STRING_ARRAY]
Source1.for(24): error #6458: This name must be the name of a variable with a derived type (structure type) [STRING_ARRAY]
Source1.for(24): error #6303: The assignment operation or the binary expression operation is invalid for the data types of the two operands. [STRING]
Source1.for(30): error #6837: The leftmost part-ref in a data-ref can not be a function reference. [STRING_ARRAY]
Source1.for(30): error #6158: The structure-name is invalid or is missing. [STRING_ARRAY]

Steven_L_Intel1 · ‎03-21-2012

I compiled the code I posted successfully with 12.1.3 (Update 9).

IanH · ‎03-21-2012

Your errors are collectively a good example of why fixed form source is evil.

Steven_L_Intel1 · ‎03-21-2012

Doh! My source is free-form - .f90.

a_leonard · ‎03-22-2012

I took care of the problem with the line length in the fixed format.

Two more questions:

1. Is the memory allocated on the heap? I don't want run into stack overflows for really big input files.

2. Is the 'Q' edit descriptor an Intel Fortran extension or a new 2003 Fortran feature?

Steven_L_Intel1 · ‎03-22-2012

The memory is on the heap. Q is an extension. If this matters, I could show you how to do it with standard Fortran. Q is easier.

David_Kinniburgh · ‎03-25-2012

Steve - your code does not address the problem of a file of arbitrary length and arbitrary width - it is limited to line widths of 512 characters or less. Reading arbitrary width lines is quite awkward to do in standard Fortran but the Q edit descriptor helps a lot. So you can use the initial loop to get both the length and maximum line width and then re-read appropriately

[bash] implicit none integer :: n=0, m=0, i, istat, maxwidth=0 character (len=:), allocatable :: line type string_array_type character (len=:), allocatable :: string end type string_array_type type(string_array_type), allocatable, dimension(:) :: string_array open(1,file='input.txt') do read(1,'(Q)',end=99) n m = m+1 maxwidth = MAX(maxwidth,n) enddo 99 continue print *, "m = ", m print *, "maxwidth = ", maxwidth allocate(string_array(m), stat=istat) allocate(character(len=maxwidth):: line, stat=istat) rewind(1) do i=1,m read(1,'(Q,A)') n,line string_array(i)%string = line(1:n) enddo do i=1,m ! compiler error on this line !! if (string_array(i)%string(1:1) == '|') print *, string_array(i)%string write (*,'(i5,": ",A)') i, trim(string_array(i)%string) enddo stop end[/bash]

Incidentally, I find the lack of automatic reallocation of deferred length allocated scalars in i/o statements a considerable inconvenience, both for ordinary read's and internal read/write's. I know this has been brought up a number of times here and elsewhere. I assume that there is some gottcha lurking there somewhere that dissuaded the standards committee from implementing this.

Steven_L_Intel1 · ‎03-25-2012

To the best of my knowledge, the notion never came up before the committee. It does seem obvious in hindsight. Adding it would change the meaning of existing programs, but F2003 did that already with automatic reallocation in assignment.

IanH · ‎03-25-2012

Quoting David Kinniburgh

Steve - your code does not address the problem of a file of arbitrary length and arbitrary width - it is limited to line widths of 512 characters or less. Reading arbitrary width lines is quite awkward to do in standard Fortran...

Did you want to read the lines of the file into a rectangular buffer (in which case there's no need for the derived type) or into a series of buffers that each have the length of the relevant line?

If it is the former, then yes - you should probably use two passes, one to scan the file to work out the maximum length and number of lines, the second to do the actual read.

If it is the latter, then because IO is (typically) slow a little bit of dynamic memory reallocation isn't going to hurt. To read a single line, use something like the following as a module procedure:

[fortran] !***************************************************************************** !! !> Reads a complete line (end-of-record terminated) from a file. !! !! @param[in] unit Logical unit connected for formatted !! input to the file. !! !! @param[out] line The line read. !! !! @param[out] stat Error code, positive on error, !! IOSTAT_END on end of file. SUBROUTINE get_line(unit, line, stat) USE, INTRINSIC :: ISO_FORTRAN_ENV, ONLY: IOSTAT_EOR !--------------------------------------------------------------------------- ! Arguments INTEGER, INTENT(IN) :: unit CHARACTER(:), INTENT(OUT), ALLOCATABLE :: line INTEGER, INTENT(OUT) :: stat !--------------------------------------------------------------------------- ! Local variables ! Buffer to read the line (or partial line). CHARACTER(256) :: buffer INTEGER :: size ! Number of characters read from the file. !*************************************************************************** line = '' DO READ (unit, "(A)", ADVANCE='NO', IOSTAT=stat, SIZE=size) buffer IF (stat > 0) RETURN line = line // buffer(:size) IF (stat < 0) THEN IF (stat == IOSTAT_EOR) stat = 0 RETURN END IF END DO END SUBROUTINE get_line [/fortran]You only have to write the above once (for each character kind). If you know that your line lengths are often going to be greater than 256 then you can increase the buffer size to reduce the number of reallocations.

You can use a similar dynamic reallocation approach to handle the number of lines, though I'd suggest a doubling buffer type of approach because the amount of data that will otherwise be copied around might bcome excessive (initally allocate the TYPE(string_array_type) array to some decent size, start reading in lines, tracking how many elements are in use, when the array is full allocate a new array that is double the size of the old one, copy over the existing elements, carry on reading, when you are done either do a final reallocation and copy to chop the buffer down to size(*), or just work with some spare elements at the end of the array).

Alternatively, do a pass through the file just counting lines ( READ(unit, "()") ) and then allocate the size of the array.

Either way, these approaches are similar to what would be used in many other languages, I don't think standard Fortran is at any particular disadvantage here.

(*) array = array(:elements_in_use) - it would be nice if the compiler recognised this pattern and optimised the assignment to be a simple update of the descriptor...

David_Kinniburgh · ‎03-26-2012

Thanks Ian. That's nice and clear.