- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Last question of the day I promise...
And BTW - Kudos to Steve and allother contributors herethat make this forum such an invaluable resource. The help and knowledge shared here is very helpful and appreciated to its full value.
Now, I want to read a text file which contains, say, 30,000,000 real numbers (in one big column - all numbers are supposed to be written with a fixed format such as D15.7 for instance).
I have no control over how this file is generated - and it is an input to my code. I want to have my input subroutine as robust as possible with respect to this input: in particular I want to detect the presence of blank lines in this file.
Assuming that A is an array allocated to the exact number of values the text file is supposed to contain, I can use a statement such as:
READ(UNIT=UNIT,FMT='(D15.7)',IOSTAT=ERROR) A
This is the fastest - and it triggers an error if one of the lines of the file do not contain a number, or is poorly formatted etc.
But it won't detect a blank line (and A will be filled with a 0 for such a blank line).
On the other hand, I can make a loop
DO I=1,SIZE(A)
READ(UNIT=UNIT,FMT='(A)',IOSTAT=ERROR) STR
IF (LEN_TRIM(A)==0) THEN
! This is an error. Write warning message.
ELSE
READ(STR,'(D15.7)',IOSTAT=ERROR) A(I)
ENDIF
ENDDO
It is much more robust (whatI want) but then it is very slow.
Is there a middle ground?
Thanks!
Olivier
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Olivier,
When you observe a blank line in the data is this an extra line written into the data, or is it a valid data line indicating 0.0?
If former (extra line) consider:
[cpp]READ(UNIT=UNIT,FMT='(D15.7)',IOSTAT=ERROR) A if(ERROR .ne. 0) call oops READ(UNIT=UNIT,FMT='(A)',IOSTAT=ERROR) STR if(STR .NE. ' ') call oops if(ERROR .ge. 0) call oops [/cpp]
IOW check to see if additional input follows expected input
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Let's assume that my file looks like this:
---- begin file ----
1
2
3
4
---- end file ----
There is a blank line in the middle. If I read the file with READ(UNIT=UNIT,FMT='(I4)',IOSTAT=ERROR) A, where A is declared as INTEGER,DIMENSION(4), the result will be: A = [1 2 0 3] (and this does NOT trigger an IO error). I want to be able to catch this.
Olivier
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Olivier,
The sample code provided in my post includes
[cpp]READ(UNIT=UNIT,FMT='(A)',IOSTAT=ERROR) STR if(STR .NE. ' ') call oops if(ERROR .ge. 0) call oops [/cpp]
Usingyour sample data the read of STR will read '4 '
Therfore the if(STR .NE. ' ') call oopswill trigger
Had your data been
1
2
3
4
thenif(ERROR .ge. 0) call oopswill trigger
You could probably ommit the if(STR .NE. ' ') call oops
But you may want to use the contents of STR to diagnose the problem
READ(UNIT=UNIT,FMT='(A)',IOSTAT=ERROR) STR
if(ERROR .ge. 0) then
do i=1,10
write(*,*) 'Input error',STR
READ(UNIT=UNIT,FMT='(A)',IOSTAT=ERROR) STR
if(ERROR .ne. 0) exit
end do
call oops
endif
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[cpp]READ(UNIT=UNIT,FMT='(D15.7)',IOSTAT=ERROR) A if (a.eq.0.and.error.eq.0) then BACKSPACE(UNIT) READ(UNIT=UNIT,FMT='(A)',IOSTAT=ERROR) STR if (strlen(str).eq.0) error=1 endif [/cpp]
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It's just a bit odd that when you read a blank record (a blank line) into a number then the default value associated is 0 (assuming that the BZ edit descriptor is not used).
Here is a suggestion for the Intel Fortran team: what about a NB (no blank) edit descriptor that would trigger an IO error if a line or record is blank? The syntax would look like:
READ(UNIT=MY_UNIT,FMT='(NB,D15.7)',IOSTAT=ERROR) A(:)
where A is an array containing logical or numerical values. The idea is to avoid to have to do all these one-by-one read-and-check statements (when only one read statement for the entire array could be much faster).
Olivier
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Olivier,
How about making sure the person/program submitting the data file is submitting a properly formatted file.
Would you expect your read routine to detect additional or dropped digits in the numeric fields?
In the old punched card days we would slap on a 9999999 card. If you get an EOF there was a missing card, if you do not see 999999 as next card after read of data you had an extra card, if you got a numeric conversion error you had trash data, if those tests passed you assumed you had a good data set. Additional sanity checks can be made on the data itself such as you are only expecting numbers in the range of X to Y and then you can get the min and max of the array to verify good looking data.
Also, blank is often used to represent 0. Excel spreadsheets do this under user preference. The effort the programming team puts in for NB format descriptor may be wasted if you are told later that you must accept blank as 0.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes, I know... I should trust the users a little more... I just want to make my input subroutines as robust as possible.
And I still maintain that it would be nice to have an optional 'NB' edit descriptor :) . It slightly bothers me that when one writes:
CHARACTER(25) S
REAL(8) D
S = ' '
READ(S,'(D25.15)',IOSTAT=ERROR) D
you get automatically 0.0d+0 in D.
Your advice regarding checking the min/max values of the data, as well as an end card is interesting. Thanks!
Olivier
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The min/max will not require a change in the program that produced the data file, whereas the "end card" will. Using filesize might be an indication of bad data if all the records are the same length (excepting when the error results in a record of a multiple of expected record length). If you go the route of requiring the user to supply and "end card" they might as well precheck the data and supply a CRC or MD5 of the data set.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This issue is just one case of a general class where the compiler is more liberal about "what is valid input" than the programmer wants it to be. I have said for years (decades, actually), that you cannot depend on an error return from a READ to validate your input. If you have rules you want to follow, then you have to read as a text string and do your own validation.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If the data is all in one column, and it is the only column, you can just do this
read(unit,*) array
if it runs into a blank line, it will just skip to the next one.
So if you had a file like this:
1
2
3
4
5
and you read a 5 element array you'd get array(1) = 1, array(2) = 2 .... array(5) = 5

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page