What determines stdout encoding from a print statement?

Kabriel · ‎10-25-2018

When a compiled Fortran code writes to standard output, what determines the resulting bytes that actually come out?

Simple program:

program write42
   print *, 42
end program

compiled with no options (ifort write42.f90 -o write42).

Background:

I have a program that prints to standard output an input file for the program itself. So one can simply:

prg.exe -i > tmp.input
prg.exe tmp.input

I compiled this on Windows 7 (using intel64 v17). A colleague ran this on a Windows 10 machine. He made a simple edit to the tmp.input file, and then tried to run the program and it failed because the input text file used wide characters -- the file encoding was UTF-16. It is likely that his editor (which I think was Sublime) actually made the encoding change, but it got me wondering, what determines the output encoding.

My initial investigation suggests that Fortran does not actually handle this, but rather calls a write function in KERNEL32.DLL. This would mean that the OS is determining the encodiing to use; so it might be Windows-1252 on my Win7 machine and UTF-16 on his Win10 machine.

jimdempseyatthecove · ‎10-25-2018

Fortran does not generate (text output) in UTF-16, unless you write your own formatting routine such as binary integer(2). Nor does it read.

If your user is modifying the file, it is his responsibility .NOT. to alter the format (from ASCII to UTF-16, Unicode, other...) formats. I am not a user of Sublime, but I suspect it has a "File Save As" capability, in which he/she can specify to keep the ASCII format.

BTW, even by keeping with ASCII, the user should be caution to not embellish the text using HTML, converting spaces to Tab, etc...

Jim Dempsey

Kabriel · ‎10-25-2018

So are you suggesting that Intel Fortran *always* outputs ASCII -- that is, it outputs 1 byte with the 7bits set to the standard ASCII table?

jimdempseyatthecove · ‎10-25-2018

CHARACTER is 1 byte (8-bits)

However, you can use Multibyte character sets. Open IVF documentation, Search, Topics all: ISO unicode

Click on Overview of NLS and MCBS Routines (Windows)

Note, though, you instructions illustrate creating an ASCII formatted text file then having the user optionally convert the file to multi-byte characters. IOW, you application will now have to determine which format the file is written in, and then take the appropriate action to properly read the file.

Jim Dempsey

gib · ‎10-25-2018

That has always been my assumption, and my observation.

Kabriel · ‎10-29-2018

Anyone from Intel?

I can confirm now that the user is using Windows Power Shell. The resulting output from power shell is UTF-16 compared to ASCII with cmd.exe. So my assumption is still that Fortran hands off the values to the KERNEL32.DLL to print to the standard out, and this may be different depending on something? It shouldn't be on code page, though, because that would mean a print statement on a Greek computer would produce different output than on a US?

-- EDIT --

It just occurred to me that PowerShell may be translating the ASCII output to UTF-16 while the stream is coming out. I'm surprised others haven't run into this?

-- EDIT 2 --

This is in fact the case. PowerShell is translating the output stream into UTF-16. You can modify this default behaviour, if you have PS versiion > 5.1.

Steve_Lionel · ‎10-29-2018

Intel Fortran has no support for UTF-16, nor for UTF-8. It does not know how to specify encoding of files it writes.