Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
Announcements
FPGA community forums and blogs on community.intel.com are migrating to the new Altera Community and are read-only. For urgent support needs during this transition, please visit the FPGA Design Resources page or contact an Altera Authorized Distributor.

Trouble reading a csv file

dboggs
New Contributor I
2,998 Views

Comma-separated-values are frequently used in my work, as for example Excel (and other modern languages) are very handy at reading them. In Fortran it is not so easy. I try list-directed input, which works "most of the time" but not always. There is trouble when a text value in the input file contains blanks, for example

value-a, value-bcd, value ef, value-ghi

then reading this with

CHARACTER(20) :: str1, str2, str3, str4
READ (5, *) str1, str2, str3, str4

results in

str1 = 'value-a     '      [padded with blanks to the declared length of 20]
str2 = 'value-bcd   '
str3 = 'value     '
str4 = 'ef      '
str5 = 'value-ghi'

The result with str3 fails the intent, because the hyphen was accidentally a blank, and blanks are used as value separators for list-directed input (in addition to commas). So, I need an edit descriptor or other formatting method that forces commas to be the ONLY separator. This is easy--indeed automatic--in Excel. My grief is caused because I am handed these files by others who work only with Excel, where they can be careless about inserting blanks in the middle of a string value, and they don't understand the significance of this in Fortran.

Is there a way to reliably read csv files in Fortran that I don't know about? 

0 Kudos
11 Replies
andrew_4619
Honored Contributor III
2,998 Views

I read the whole CSV line as one large character string and then use INDEX function to find the first comma, lets call it position IP. The fist item is the sub-string from position 1 to ip-1. This is in a loop so we then search for the next comma etc to give sub-string last_IP+1:IP-1. You need some extra code if the sub-string has quotes as it is an excel string that may have commas in it that you wish to ignore....

 

0 Kudos
Steven_L_Intel1
Employee
2,998 Views

It's also possible to tell Excel to enclose string values in quotes, but I gather you don't have control over that. app4619's approach is what I use, but you can be done in by a string that contains a comma. If you can convince the users to give you tab-delimited files, that can work better.

0 Kudos
jimdempseyatthecove
Honored Contributor III
2,998 Views

Read the line in, then parse the commas.

module parse
    integer, parameter :: MAX_LINE = 256    ! you determine the size of line
    character(MAX_LINE) :: line
    contains
    function getArg(n) result(arg)
        implicit none
        character(MAX_LINE) :: arg
        integer :: n,i,j,count
        j = 0
        do count=1,n
            i = j + 1
            j = INDEX(line(i:),',')
            if(j == 0) exit
            j = j + i - 1
        end do
        if(j == 0) then
            if(count == n) then
                arg = line(i:)
            else
                arg = ' '
            endif
        else
            arg = line(i:j-1)
        endif
    end function getArg
end module parse
    
program lineParse
    use parse
    implicit none

    ! Variables
    CHARACTER(20) :: str1, str2, str3, str4, str5
    ! Body of lineParse
    line = 'value-a, value-bcd, value ef, value-ghi'    ! simulate read of line
    str1 = getArg(1)
    str2 = getArg(2)
    str3 = getArg(3)
    str4 = getArg(4)
    str5 = getArg(5)
   print *, str1, str2, str3, str4, str5

end program lineParse

Jim Dempsey

0 Kudos
John_Campbell
New Contributor II
2,998 Views

It has always been a challenge for me as to how flexible data input should be.

Once a , delimiter is allowed,

  • should trailing spaces be allowed, such as 11, 22 33 14, 5 (list-directed editing) You can get unexpected values with 22 33 44.
  • should " string with enclosed spaces or , be allowed" as a single string
  • should identifying integer and real fields be available.
  • is 1e4 a real or e4 a real
  • should simple math be included, as 11.4*23-3

All this can be done, expanding on Jim's approach. I find that mixing spaces and , as a delimiter is usually expected, while "string" is a useful extension.
Simple math looks good and can document conversion factors, but with Excel these are no longer necessary.

A simple test for integer or real is to read the field, with an iostat= although back with F77 the standard did not require a trapped response, which I learnt the hard way when transferring code to IBM Fortran 77. It was interesting to see all the loopholes in the F77 standard.

eg line = 'value-a, value-bcd, value ef,, 1.23, e5, 1e4, "with , inside", 12, 13, 1.2.3, 1 2 3,   '    ! simulate read of line
1.2.3 will fail but 1 2 3 reads ok as 123 but is probably not what was meant.

However for my own uses, as long as the data layout is clearly defined, fixed layouts work well, as .prn exports from Excel are quite functional.

John

0 Kudos
John_Campbell
New Contributor II
2,998 Views

Actually, a simpler change of enclosing value ef in quotes as "value ef" works without any change to your code.

CHARACTER(20) :: str(4)
character(256) :: line
integer i

 line = 'value-a, value-bcd, value ef, value-ghi'

 READ (line, *) str
 write (*,*) (trim(str(i)),'| ',i=1,4)

 line = 'value-a, value-bcd, "value ef", value-ghi'

 READ (line, *) str
 write (*,*) (trim(str(i)),'| ',i=1,4)

end

I'm not sure about the suggestion of switching to tab delimited files is a good idea, as formatted reads have significant problems and I don't know about their general suitability for list-directed reads.

John

0 Kudos
rase
New Contributor I
2,998 Views

I usually use the semicolon ; as a separator for csv files in Excel, mainly to avoid the problems caused by the different conventions for decimal point/comma in localized versions of Windows and Excel, but also for commas in text strings. Semicolons appear much less frequently than commas or points in text strings.

0 Kudos
jimdempseyatthecove
Honored Contributor III
2,998 Views

Semicolon is a good choice, but it may be difficult in getting external data sources to go to the "effort" of specifying ; as the field separator. This is likely the situation that dboggs is facing.

Jim Dempsey

0 Kudos
nvaneck
New Contributor I
2,998 Views
I created a subroutine to parse strings which I use for CSV records as well as other strings. The result is a defined type FIELDS which contains what I need., e.g., CALL PARSE(RECIN,FIELDS,NV,QUOTE,DELIM,N) RECIN CHARACTER(n) Vector to parse. FIELDS TYPE (PFIELDS) Construct of size NV to hold the non-blank starting locations (FIELDS(i)%BEGIN), width (FIELDS(i)%WIDTH), and type (FIELDS(i)%TYPE) in RECIN of each field. NV INTEGER(4) Maximum number of fields to retrieve. QUOTE CHARACTER(1) Bytes within QUOTEs are considered a single field regardless; double quotes within a quoted field will be reduced to a single quote and kept in the string. DELIM CHARACTER(n) Fields are delimited by blanks and DELIMs. N INTEGER(4) Number of fields found. Convenient for me as I need to parse several types of strings and can vary the delimiters required. As for commas vs. semi-colons, you can check for local usage and set things accordingly.
0 Kudos
jimdempseyatthecove
Honored Contributor III
2,998 Views

nvaneck,

Excellent! Nice generalize solution, at least for the ASCII character set.

Jim Dempsey

0 Kudos
dboggs
New Contributor I
2,998 Views

Thanks to everyone for the useful suggestions and discussion! A couple of comments:

I am somewhat familiar with parsing techniques and have used them before. But in this case it is a one-off program with short-term life expectancy, so I am highly motivated to stick to simple standard methods without detailed development. The sample code fragments provided are helpful however.

I can easily get the data files tab delimited, but how would that help? How can Fortran read tab delimited data (short of writing my own parsing routine, of course, which I have done before)? If it is read using a simple list-directed method (and I'm not sure that it can), wouldn't this be subject to the same embedded-space limitation as reading comma-separated data?

It seems to me that, as data transfer using this format (comma delimited or tab delimited) is becoming more common, it would behoove the Fortran developers to implement an intrinsic technique to handle this easily. Perhaps a variation of list directed, either input or output, where the desired separators can simply be specified in a list. In my case, for example, the input problems would be solved by specifying a comma as the one and only separator.

Call this a feature request for the next major version.

0 Kudos
jimdempseyatthecove
Honored Contributor III
2,998 Views

>>I can easily get the data files tab delimited

It would help when the export from excel (save as) used commas: 1,234.56

You will have to make a parser or find one on the web that suits your purpose.

Jim Dempsey

0 Kudos
Reply