Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
27097 Discussions

Embedded Ctrl-Z in Fortran source file causes havoc

mecej4
Black Belt
667 Views

I have been cleaning up an old program written in Fortran 77, spanning several source files and containing 51 subroutines, 10 functions and the main program.

I used some utility programs to replace DO nnn i= type loops do DO i= ... END DO type loops, IF(expr)n1,n2,n3 to IF...ENDIF, etc.,  and make the program easier to follow and debug. I chose to have the utility output free form Fortran source files. I moved some subroutines from one source file into another, and that caused a lot of trouble and puzzlement.

I created  makefiles for use with several different Fortran compilers. With some compilers, the build went fine and EXEs were built. With Ifort and Ifx, a strange thing happened. MAKE would decide that OBJ files were current, and issue a link command. The link command would fail with messages about unsatisfied externals, even though the missing subprograms were in the source files and the OBJs of those source files were processed by the linker. I tried deleting all OBJ files and running MAKE again, but no luck.

Then, I noticed that some of the OBJ files were suspiciously small. Poking into the symbols in those files revealed that only some of the subprograms in the pertinent source files had been compiled. After several trips down blind alleys, I found that these source files contained a CONTROL-Z character at the end of a comment line somewhere within the source file, rather than at the very end of the source file.

I thought we were done with CONTROL-Z serving as EOF marker when we moved from CPM-86 to MSDOS, but that is not so.

Here is a reproducer -- in the attached Zip file, you will find a single source file with 13 lines and 2 subroutines. In between the two subroutines is a comment line with a CONTROL-Z just before the LF at the end of line 7. Some editors can display  "non-visible" characters such as CONTROL-Z, so I include a screenshot -- please see the "SUB" on line 7.

Intel Fortran (Ifort as well as Ifx) ignore the source lines that come after the CONTROL-Z, putting only the code for ASUB into the OBJ file. So do FPS-4, CVF6.6C, and NAG. On the other hand, Gfortran, Lahey-Fujitsu LF 7.1, Silverfrost FTN95 and Absoft ignore the CONTROL-Z, and compile both subroutines ASUB and BSUB.

I hope that the Intel compiler designers will consider this issue. If they continue to have the Intel Fortran compilers treat CONTROL-Z as an end-of-source marker, as a user I should appreciate a warning that source scanning was terminated prematurely because of the CONTROL-Z.

 

ctrlz.jpg

P.S.: Another oddity: if the '!' in Line 7, Column 1 is changed to 'c', 'C' or '*', a carryover from the indication of a comment line from fixed format Fortran, Ifort issues the following error message 30 times before abandoning the compilation. Each repetition has the position marker shifted one column to the right from the preceding error message.

 

 

ctrlz.f90(7): error #5078: Unrecognized token '|' skipped
c ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
--^

 

 

0 Kudos
18 Replies
Steve_Lionel
Black Belt Retired Employee
621 Views

Huh. I know that Intel Fortran still treats Ctrl-Z as an EOF marker in unformatted files but was unaware of it being recognized in source files. I wonder if this is something in the underlying C I/O system - I find it hard to believe that there is explicit code in the compiler for this!

jimdempseyatthecove
Black Belt
595 Views

FWIW, I agree with Steve, this may be a Wind

C:\test\ctrlz>dir
 Volume in drive C has no label.
 Volume Serial Number is F8CE-A4A4

 Directory of C:\test\ctrlz

03/24/2021  04:39 PM    <DIR>          .
03/24/2021  04:39 PM    <DIR>          ..
03/24/2021  06:52 AM               249 ctrlz.f90
               1 File(s)            249 bytes
               2 Dir(s)  258,555,604,992 bytes free

C:\test\ctrlz>type ctrlz.f90
subroutine Asub(a,b,c)
implicit none
integer a,b,c
c = a+b
return
end subroutine
! ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
C:\test\ctrlz>

ows CRTL issue:

 

mecej4
Black Belt
583 Views

Thanks, Jim.

Yes, type chops off the part of the file after the ^Z, but more shows the whole file, and the file size displayed by dir is for the complete file, as well!

If the section of code in the Fortran compiler that opens the source file, perhaps the second argument to the fopen CRTL STDIO routine, i.e., the flags argument, should have been "rb" rather than "r". On the other hand, I do not know if fopen is used rather than CreateFile.

JohnNichols
Valued Contributor II
556 Views

@mecej4  - this is a great lesson.  We often do not know the insides and accept the black boxes as correct. 

Well done. But then again your stuff is always good.

jimdempseyatthecove
Black Belt
524 Views

>> perhaps the second argument to the fopen CRTL STDIO routine, i.e., the flags argument, should have been "rb" rather than "r". 

Perhaps...

However, some users may use ^Z as a means to have a "fat source" file containing both program and data. And changing the behavior would cause issues for them. In the "old days", a card deck could contain program source cards followed by data cards.

There are other implementation issues as what to do with

0 NUL Null
1 SOH Start of Header
2 STX Start of Test
3 ETX End of Text
4 EOT End of Transmission
5 ENQ Enquiry
6 ACK Acknowledge
7 BEL Bell
8 BS    Backspace
9 TAB Horizontal Tab
10 LF Linefeed (handled/system dependent)
11 VT Vertical Tab
12 FF Form Feed
13 CR Carriage Return (handled/system dependent)
14 SO Shift Out
15 SI Shift In
16 DLE Data Link Escape
17 DC1 Device Control 1
18 DC2 Device Control 2
19 DC3 Device Control 3
20 DC4 Device Control 4
21 NAK Negagive Acknowledge
22 SYN Synchronous Idle
23 ETB End of Transmission Block
24 CAN Cancel
25 EM End of medium
26 SUB Substitute
27 ESC Escape
28 FS File Separator
29 GS Group Separator
30 RS Record  Separator
31 US Unit Separator

Then ?? 128:255 ???

All of the above are implementation defined.

Steve may have some input as to if CR and/or LF are implementation defined

Jim Dempsey

Steve_Lionel
Black Belt Retired Employee
519 Views

The Fortran standard talks about the "Processor character set" - characters that may appear in source statements. (In standard-speak, "processor" is the "thing" that interprets your source code - substitute "compiler" generally, though it also encompasses the underlying OS and hardware.)

"Each character in a processor character set is either a control character or a graphic character. The set of graphic characters is further divided into letters (6.1.2), digits (6.1.3), underscore (6.1.4), special characters (6.1.5), and other characters (6.1.6)."

"Special characters" are things such as parentheses, equal sign, etc. The standard then goes on to say, for "Other characters", "Additional characters may be representable in the processor, but shall appear only in comments (6.3.2.3, 6.3.3.2), character constants (7.4.4), input/output records (12.2.2), and character string edit descriptors (13.3.2)."

Oddly, the only mention of "control characters" is in that first quote - it is never defined! (Something I will have to ask about.)

All this is to say is that the standard is silent about what any non-graphic character should mean or where it is permitted to appear. The standard is also silent on just how source lines are delivered to the processor, other than some handwaving in the description of INCLUDE.

Characters such as CR and LF, in some implementations, are used as line delimiters, and as such would be interpreted as separating source lines. (Not all platforms use control characters for this purpose.)

Keep in mind that the standard describes a standard-conforming program, and what the processor must do with it. If your source contains anything not given an interpretation by the standard, a processor is allowed to do anything it likes with it. In the end, you should not assume that a CTRL-Z in a source line is interpreted in any specific way, and it is best to not have such characters in your source file.

 
Steve_Lionel
Black Belt Retired Employee
500 Views

Regarding "control characters", I am informed that these are defined in other standards referenced by the Fortran standard (ISO 10646 and ISO 646).

mecej4
Black Belt
495 Views

Here is what the DEC Fortran-77 manual says about control characters in source code. Note that special treatment is given to the Ctrl-L and Ctrl-Z characters.

The second sentence of the second paragraph puzzles me. Is it suggesting that a Fortran program is being used to write a new Fortran source file, using the ENDFILE statement in the writer program?

----------------------------------

Nonprintable Characters

 The form-feed character (0C hex) is treated as a blank without
 causing a diagnostic message to be issued.  In addition, a source
 record of length 1 containing a form-feed character causes the
 compilation source listing to begin a new page.

 A source record of length 1 containing a Ctrl-Z character (1A hex)
 is treated as a blank line.  Such a record is created by the
 ENDFILE statement, if the command line option -vms is specified.

 All other control characters are valid, except 00(hex) and 01(hex).

Steve_Lionel
Black Belt Retired Employee
483 Views

That's a very interesting manual reference. The compiler it describes is not in the "lineage" of current Intel Fortran and it isn't related to VAX FORTRAN-77 either.  I'm not sure exactly where it came from, but my memories of the RISC ULTRIX days are dim (I didn't work on that platform.)  I faintly recall that in the early Alpha days that there was indeed an F77 compiler.

I do remember the single FF causing a new listing page. That text about ^Z and ENDFILE is odd, I agree, but it was not unusual to have programs writing other programs (it still happens.) Yes, on VMS at least, an ENDFILE record was a one-byte ^Z.

JohnNichols
Valued Contributor II
479 Views

How do you look at a Ctr Z in a text file?

 

mecej4
Black Belt
473 Views
  • Many editors (such as Notepad++) permit you to display control characters using graphic icons that cannot be mistaken for normal characters.
  • There are binary editors such as BVI and HXD. 
  • Command line utilities exist for the purpose. Look for programs with names such as "hexdump".

Cygwin includes a utility called hexdump. If I run hexdump on one of the problematic files, TIME3.FOR, I see at the end the following lines:

 

*
00000680  2a 0d 0a 0d 0a 20 20 20  20 20 20 72 65 61 6c 20  |*....      real |
00000690  66 75 6e 63 74 69 6f 6e  20 46 71 68 28 47 57 4c  |function Fqh(GWL|
000006a0  2c 41 71 68 2c 42 71 68  29 0d 0a 20 20 20 20 20  |,Aqh,Bqh)..     |
000006b0  20 46 71 68 3d 2d 41 71  68 2a 65 78 70 28 42 71  | Fqh=-Aqh*exp(Bq|
000006c0  68 2a 61 62 73 28 47 57  4c 29 29 0d 0a 20 20 20  |h*abs(GWL))..   |
000006d0  20 20 20 72 65 74 75 72  6e 0d 0a 20 20 20 20 20  |   return..     |
000006e0  20 65 6e 64 0d 0a 0d 0a  2a 20 7c 7c 7c 7c 7c 7c  | end....* |||||||
000006f0  7c 7c 7c 7c 7c 7c 7c 7c  7c 7c 7c 7c 7c 7c 7c 7c  ||||||||||||||||||
*
00000730  1a                                                |.|
00000731

 

If you already know that you want to look for a specific character such as CTRL-Z, you can use the following command

 

hexdump -C TIME3.FOR | grep -i " 1a"

 

and you will see the output

 

00000730  1a                                                |.|

 

The optimization solver GAMS includes a hexdump utility which has a nice feature: it provides not only a hex dump, but also statistics on the contents of the file:

 

Characters read = 1841

Control Characters Used (0-31)

Dec Hex Cnt  Description
 10  0A  58  LF   Line Feed
 13  0D  58  CR   Carriage Return
 26  1A   1  SUB  Substitute ^Z

Alternatively, you can use the following specific-purpose C program on a suspect file.

#include <stdio.h>
int main(int argc, char *argv[]){
int c,i, char_cnt[128]; FILE *fil;

if(argc != 2){
   fprintf(stderr,"Usage: hexcnt <filename>\n");
   exit(1);
   }
for(i=0; i<128; i++)char_cnt[i]=0;
fil = fopen(argv[1],"rb");
while((c=fgetc(fil)) != EOF){
   if(c < 128)char_cnt[c]++;
   }
for(i=0; i<0x20; i++)
   if(char_cnt[i] > 0)printf("%02X %8d\n",i,char_cnt[i]);
if(char_cnt[0x07F] > 0)printf("%02X %8d\n",i,char_cnt[0x07F]);
}

This C program, when compiled and run on the suspect file TIME3.FOR, outputs

S:\SWMS3D\dmp>hexcnt TIME3.FOR
0A       58
0D       58
1A        1

 

Steve_Lionel
Black Belt Retired Employee
460 Views

You can do it in Visual Studio too, though Microsoft makes you hunt for it.

  • File > Open > File...
  • Select your file
  • Click the triangle to the right of Open, select Open With...
  • Scroll the list towards the bottom and select "Binary Editor"

Screenshot 2021-03-27 143742.jpg

You can even change the values here. A few years ago, MS hinted that they wanted to drop this, but too many people complained.

JohnNichols
Valued Contributor II
439 Views

I have spent an interesting day playing with file formats.  The SENSOR program puts out CSV files in a specific format.  I have a Fortran program that takes them apart, mostly thanks to jim et al.,  but occasionally I make a mistake and open the file with EXCEL and the rather interesting MS program changes the time output by rounding, it means the file cannot be read. 

My 14 year old daughter wants to know why read is not red, I said like Fortran  the English language is hard to follow:  

Case in point a sample Fortran program from 

! ------------------------------------------------------
! Compute the area of a triangle using Heron's formula
! ------------------------------------------------------

PROGRAM  HeronFormula
   IMPLICIT  NONE

   REAL     :: a, b, c             ! three sides
   REAL     :: s                   ! half of perimeter
   REAL     :: Area                ! triangle area
   LOGICAL  :: Cond_1, Cond_2      ! two logical conditions

   READ(*,*)  a, b, c

   WRITE(*,*)  "a = ", a
   WRITE(*,*)  "b = ", b
   WRITE(*,*)  "c = ", c
   WRITE(*,*)

   Cond_1 = (a > 0.) .AND. (b > 0.) .AND. (c > 0.0)
   Cond_2 = (a + b > c) .AND. (a + c > b) .AND. (b + c > a)
   IF (Cond_1 .AND. Cond_2) THEN
      s    = (a + b + c) / 2.0
      Area = SQRT(s * (s - a) * (s - b) * (s - c))
      WRITE(*,*) "Triangle area = ", Area
   ELSE
      WRITE(*,*) "ERROR: this is not a triangle!"
   END IF

END PROGRAM  HeronFormula

https://ourcodingclub.github.io/tutorials/fortran-intro/  

This would be hard to follow for a new programmer. 

JohnNichols
Valued Contributor II
434 Views

Humans are strange - they go out to collect the data files.  All of the Fortran drawing programs are set up to handle 51 FFT's of 16384 time steps or about 8 minutes.  Ok 8 minutes of standing is a long time. 

You get files from 2 minutes, with the comment - I know you can fix it -- read  I know you are a bit impatient to 4 minutes which is statistically just ok to an hour.  Sorry I forgot the switch it off - read I hope Starbucks was nice. 

In order to open the large files there is only one text editor I have seen that will do it -- VEDIT.  Comes from Canada from about 1988.  It also has great block editing features,  but it adds a line to the end of a text file, so you end up with a blank line.  I have been to lazy to tap the blank lines on a new program so I have to open the 8 minute file with notepad and check for the blankline -- it is not reliable in VEDIT. 

 

JohnNichols
Valued Contributor II
414 Views

Capture.PNG

this shows after excel edits the time even saving as CSV

Capture1.PNG

this shows original 

Capture2.PNG

this shows the end of the file in VEDIT - no line numbers no idea 

Capture3.PNG

this shows notepad++ you can see the spare line. 

 

 

 

Arjen_Markus
Valued Contributor III
392 Views

Caveat emptor!

It shows that MS Excel is not a text editor - it is very very keen on interpreting your data. That is a good thing, sometimes, but not always. And to illustrate a problem with that eagerness: "03-10-2021" may mean the 10th of march, 2021 but it is ambiguous, because in my part of the world it should be interpreted as the third of october, 2021. And I leave the possibilities for "03-10-07" as an exercise.

As for the extra line in notepad++: some text editors seem to consider the end-of-line characters LF/CR (in whatever combination or selection) as line separators, rather than as line endings. It has cost me many hours of grieve over the years ...

JohnNichols
Valued Contributor II
368 Views

I am assuming that grieve is a misspelling for grief, but it only proves a point - not sure what the point is but, thanks for the comment.  Dates are a beast.  

PS: If you have never read the book The Mad Scientist Club published by Purple House - you will love it - give it to any 12 year old for xmas and see them learn science the easy way.  

Arjen_Markus
Valued Contributor III
364 Views

Oops, indeed, I meant "grief". I tried to make a few points, actually, but the common theme is that expectations may differ and will differ more the smarter the software tries to be.

Reply