Solved: How should I handle this input problem?

WSinc · ‎05-25-2014

I have a series of input lines that all look like this:

CAR_MODEL 123 5.4 1968 9.4 55 99.0 3.3 44

The first field is ASCII, the rest are free-form integers or floating pt numbers.

All the numbers can be read is as REALs though.

I obviously can put quotes around the ASCII field, and read them all in via:

READ(5,*)label, rnum(1:8)

! But I chose to BLANK OUT the first field after reading it in:

READ "(A)",LINE

LABEL=line(1:10) ! assuming the first number is after col. 10

line(1:10)=" "

read(line,*)rnum(1:8)

I was wondering if there is a cleaner way to treat ALL the fields as free-form, where the first one is ASCII, the rest are REALs.

APPARENTLY THE FORTRAN I/O WONT ACCEPT BLANKS AS SEPARATORS WHEN A FIELD IS ASCII.

Or am I mistaken?

John_Campbell · ‎05-28-2014

Bill,

My approach to reading this type of input is to:

1) first read the line of input into a character string,

2) Scan the string, replacing any character that the provider of the text can be considered to use as a field seperator with a comma ( which is recognised by a Fortran formatted read ). If character fields are expected, you can enclose them in " " or some way that is compatible with the Fortran read.

3) read the character string using a formatted read.

The advantage of this approach is:

When scaning the string, you can report any unknown characters and so develop a better scan rule. Smart reporting can be very helpful, much better that a program crash in a large input file.

When reading the character string, a read (string,*) fails if there is not sufficient information in the string, while the read(string,fmt=...) will fill missing values as zero. You could count the number of fields in the 2) scan process.

You can also replace 3) by modifying the 2)scan to provide a list of characters associated with each seperated field in the line.

I use this approach when receiving survey data which can use a number of field seperators that Fortran reads do not support, including tab, semi-colon, back slash or any other strange character not expected to be in a field name or numeric field.

Essentially, I scan each line of input to make it Fortran friendly, reporting any strange characters I find, rather than letting the program crash!

John

View solution in original post

mecej4 · ‎05-25-2014

One solution to consider:

program freefmt
character*8 :: label
character*132 :: line
integer :: i1,i2,i3,i4
real :: r1,r2,r3,r4

read(*,'(A)')line
label=line(1:10)
read(line(11:),*)i1,r1,i2,r2,i3,r3,r4,i4
write(*,*)i1,i2,i3,i4
write(*,*)r1,r2,r3,r4
end

Running the program gives (the first line is the input data line echoed to the output, the other two are output)

CAR_MODEL  123  5.4   1968   9.4  55  99.0  3.3  44
         123        1968          55          44
   5.400000       9.400000       99.00000       3.300000

Steven_L_Intel1 · ‎05-25-2014

You don't have to put quotes around the character value unless it has embedded delimiters (blank, comma, slash).

WSinc · ‎05-27-2014

Hi Mecej4 and Steve -

There is a complication (that I just noticed).

The input data has TABS in it. So the actual starting column of a line is somewhat uncertain.

I somewhat have to adjust the code to take this into account, or figure out some way to remove

the tabs.

In order words, the data might LOOK like the numbers start in COLUMN 11, but when I go to

comment out or ignore cols 1-10, it trashes the numerical data as well.

Well, I can just use the * format, but is there a way to do that with a NUMBERED FORMAT statement instead?

For some reason, the FORMAT statements don't take TABS into account like they should. Or is that deliberate?

mecej4 · ‎05-28-2014

If your file has tabs, you can do one or more of the following: (i) replace all the tabs by a visible character, such as '$', (ii) use a utility or an editor to convert tabs to spaces before processing the file with your program, (iii) read the file a line at a time into a CHARACTER*132 (big enough) variable and replace the tabs in your program before reading the data using a suitable format.

With any procedure that you use, you need to adopt and follow a convention regarding where the tab stops are and how they are to be interpreted.

WSinc · ‎05-28-2014

OK, I have found a new problem -

I wanted to print out some output from the screen, so used the MARK and ENTER features to copy it. But when I display the copied output to print it, the heading of the columns don't line up the same way.

Is there a way to MAKE the copied output print exactly the same as the screen I copied it from?

The FORMAT of the headings has Txx features in it, like T23, T34,T46, etc.

Lines that have just numbers look like they are supposed to -

I want to make the headings line up[ with the numbers, and they DO on the screen, but NOT when I copy the MARKED output somewhere else.

I did not see the direct PRINT command, i.e. PRINT marked for example, so I have to copy it elsewhere.

WSinc · ‎05-28-2014

Apparently the way the OUTPUT gets displayed, it has invisible stuff in it that does not get copied when I transfer the output to another temp file for printing. I am using either Print * or write(5,* for the screen display. Is there a way around this?

Like I said before I want the HEADINGS to line up with the NUMBERS that are being printed. Hopefully without making major changes to the program.

Example:

Real*8 a(8)

print 101,"n1","n2","n3",n4","n5","n6","n7",n8"

101 format(4A12,T59,4A12) ! having the T thing in there screws it up

print 102,a

102 FORMAT(4F12.4. 10x 4f12.4)

The headings line up OK on the screen, but NOT when I xfer the screen contents to another file for printing.

Maybe the presence of Txx descriptors affects this?

Steven_L_Intel1 · ‎05-28-2014

The T format item simply indicates a character position within the output record. This may not correlate to how the output appears on screen if you have characters such as tabs in the output. Note that tab stops are very much dependent on the application used to display the text. My advice would be to make sure there are no tabs in your character data.

For input, tabs are considered the same as blanks. As I indicated above, if the strings you are reading don't have embedded blanks, tabs, commas or slashes, you can use list-directed input and it will use the whitespace as separators between the values.

mecej4 · ‎05-28-2014

billsincl wrote:

... the way the OUTPUT gets displayed, it has invisible stuff in it that does not get copied when I transfer the output to another temp file for printing

If an input file has tab characters in it, you will probably add to the confusion by reading the file as a formatted file and outputting the variables to another text file, using the 'T' edit descriptor.

As Dr. Fortran advised, your best course of action is to get rid of the tab characters in the input file once and for all. If that file is produced by one of your other programs, modify the WRITE statements that output tabs. If not, open the input file containing the tabs in a well-featured text editor (e.g., Notepad++), use its Edit menu to convert tabs to spaces, and save the modified copy of the file to use as input in your programs.

WSinc · ‎05-28-2014

So, I take it that when I MARK the output screen and press ENTER to copy it, there is no way to print that directly?

In other words, avoiding copying it to a TEMP file.

I think if I put an LGU and say WRITE(LGU,FMT) it will give me a file with the right headers.

I am just trying to avoid a lot of extra work.

The problem is NOT tab characters in the input file, it is using

tab characters to space the OUTPUT headings.

Example:

print 101,HDR

character*12 HDR

101 format(A,T13,A,T25,A,T37,A,T51,A,T63,A,T75,A,T87,A)

The problem is solved if the output screen has BLANKS in it, and not invisible characters.

Is there a way to force that? Or is there a way to force all the characters to be visible? Dots for example?

Steven_L_Intel1 · ‎05-28-2014

Don't use tabs - as I wrote above, tab spacing is not predictable. Use blanks, and a monospaced font, if you want predictable spacing.

mecej4 · ‎05-28-2014

billsincl wrote:
#4: The input data has TABS in it.

billsincl wrote:
#10: The problem is NOT tab characters in the input file, it is using tab characters to space the OUTPUT headings.

billsincl wrote:
#10: The problem is solved if the output screen has BLANKS in it, and not invisible characters.

Inconsistent problem descriptions often lead to suggestions that are unlikely to be helpful.

Blanks belong to the set of invisible characters. Visibility is just one attribute of a character. Many editors provide, at the user's option, visible graphical representations of normally invisible characters.

John_Campbell · ‎05-28-2014

Bill,

My approach to reading this type of input is to:

1) first read the line of input into a character string,

2) Scan the string, replacing any character that the provider of the text can be considered to use as a field seperator with a comma ( which is recognised by a Fortran formatted read ). If character fields are expected, you can enclose them in " " or some way that is compatible with the Fortran read.

3) read the character string using a formatted read.

The advantage of this approach is:

When scaning the string, you can report any unknown characters and so develop a better scan rule. Smart reporting can be very helpful, much better that a program crash in a large input file.

When reading the character string, a read (string,*) fails if there is not sufficient information in the string, while the read(string,fmt=...) will fill missing values as zero. You could count the number of fields in the 2) scan process.

You can also replace 3) by modifying the 2)scan to provide a list of characters associated with each seperated field in the line.

I use this approach when receiving survey data which can use a number of field seperators that Fortran reads do not support, including tab, semi-colon, back slash or any other strange character not expected to be in a field name or numeric field.

Essentially, I scan each line of input to make it Fortran friendly, reporting any strange characters I find, rather than letting the program crash!

John

WSinc · ‎05-29-2014

Thanks John ;

Actually. the input data came from an EXCEL spreadsheet.

I was hoping to find a utility that would give me raw input lines from the data file - -

something more "Fortran friendly."

But when I tried to convert it, I got all kinds of garbage in the file.

Your suggestions are very much appreciated.

John_Campbell · ‎05-29-2014

Bill,

If you export the information from Excel as a .csv file, most of the problems should already be solved.
You need to have the fields consistent with the variables expected and if this is the case you should be able to use read (lu,*,iostat=iostat) ...
Don't forget to use iostat= to trap all the headings that may exist in the .csv dump.
You might need to be careful with characters, but in most cases this should work seamlessly.
Also the currency or ,000, formats should not be exported in a .csv file (I've forgotten if Excel excludes any of this in a .csv format)
I also use .prn files where the data layout is a bit easier to review.
Keep the .xls file and you can always try again.

John

Steven_L_Intel1 · ‎05-30-2014

If you export as .csv, make sure that character fields are enclosed in quotes (this is an Excel option.) As John says, if you do this you can easily read the values with list-directed input.