Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

What should ENCODING='UTF-8' do?

Mark_Lewy
Valued Contributor I
4,869 Views

The Fortran standard states:

12.5.6.9 ENCODING= specifier in the OPEN statement
The scalar-default-char-expr shall evaluate to UTF-8 or DEFAULT. The ENCODING= specifier is permitted
only for a connection for formatted input/output. The value UTF-8 specifies that the encoding form of the file
is UTF-8 as specified in ISO/IEC 10646. Such a file is called a Unicode file, and all characters therein are of ISO
10646 character kind. The value UTF-8 shall not be specified if the processor does not support the ISO 10646
character kind. The value DEFAULT specifies that the encoding form of the file is processor dependent. If this
specifier is omitted in an OPEN statement that initiates a connection, the default value is DEFAULT.

 

As Intel Fortran doesn't support an ISO 10646 character KIND, AFAIK:

1) Should the compiler diagnose the use of 'UTF-8' as a standards violation?

2) Is it (as I assume) behaving as if the value was 'DEFAULT'?

 

Is there any likelihood that Intel Fortran will support Unicode in the near future?

1 Solution
Steve_Lionel
Honored Contributor III
4,845 Views

1) No, this is not something the compiler is required to diagnose, though I think it would be good if it did.

2) The standard does not specify what the behavior should be, so it is implementation-dependent. My guess is that this value is ignored, though the compiler does complain if you say something other than DEFAULT or UTF-8.

View solution in original post

11 Replies
Steve_Lionel
Honored Contributor III
4,846 Views

1) No, this is not something the compiler is required to diagnose, though I think it would be good if it did.

2) The standard does not specify what the behavior should be, so it is implementation-dependent. My guess is that this value is ignored, though the compiler does complain if you say something other than DEFAULT or UTF-8.

Mark_Lewy
Valued Contributor I
4,677 Views

Thanks Steve, that's what I thought. 

As a follow up, this is the issue we have, our simulation engines process (text) job files that look like WIndows INI files & contains paths.  For example:

[Files]
2DF=C:\ProgramData\Innovyze\InfoWorksAgent\SA_14D4C022-9EB8-4CD4-B7F4-F4C67614F748\iwswnet2#4.2df
2DZ=C:\ProgramData\Innovyze\InfoWorksAgent\SA_14D4C022-9EB8-4CD4-B7F4-F4C67614F748\iwswnet2#4.2dz
IWR=C:\Users\lewym\AppData\Local\Innovyze\Results Folder\14D4C022-9EB8-4CD4-B7F4-F4C67614F748\iwswsim12.iwr
QIN=C:\ProgramData\Innovyze\InfoWorksAgent\SA_14D4C022-9EB8-4CD4-B7F4-F4C67614F748\iwswsim12eventiid81229.qin
RunStatistics=C:\Users\lewym\AppData\Local\Innovyze\Results Folder\14D4C022-9EB8-4CD4-B7F4-F4C67614F748\iwswsim12.analytics
RPT=C:\Users\lewym\AppData\Local\Innovyze\Results Folder\14D4C022-9EB8-4CD4-B7F4-F4C67614F748\iwswsim12.rpt
INP=C:\ProgramData\Innovyze\InfoWorksAgent\SA_14D4C022-9EB8-4CD4-B7F4-F4C67614F748\sa_14d4c022-9eb8-4cd4-b7f4-f4c67614f748_218_12_lewym_20231011_095044sim_job.inp

---END---

What happens, if the username contains non-ASCII characters or (as they can) there is a non-default location for results?

The documentation for FILE= is not very forthcoming:

FILE = name

name

Is a character or numeric expression.

The name can be any pathname allowed by the operating system.

Any trailing blanks in the name are ignored.

 

What does "allowed by the operating system" mean. Does it assume that the characters are encoded for the current code page on Windows?  What about Linux?  Encoding the paths as UTF-8 appears to not work on Windows.

0 Kudos
andrew_4619
Honored Contributor III
4,665 Views

What does "allowed by the operating system" mean. Well for example windows file name cannot have &,  * and a list of other characters but the Fortran runtime will use other libraries lower down in the detailed file handling so that is where an error is likely to be generated. That aspect will be implementation/OS specific and not standard to  Fortran.  The upper end of the Intel Fortran  assumes that names are ASCII character strings. (correct me if I am out of date here someone) If you want to do file i/o using Unicode names that the OS supports then you will need to use utilities (such as windows SDK) for file management and IO.  It is not clear to me from the question what your specific problem(s) are is it only reading files that are UTF-8 encoded, or is it creating/writing files based on what is read?

0 Kudos
Mark_Lewy
Valued Contributor I
4,653 Views

The problem I believe is that the file entries in the job file are being written to the job file as UTF-8 and the Fortran code to read the job file is reading them into character variables.  These are subsequently used as FILE specifiers in OPEN statements, which can fail.  I suspect that underlying OPEN on Windows is a call to OpenFile, in which case "The string must consist of characters from the 8-bit Windows character set. The OpenFile function does not support Unicode file names or opening named pipes.", so unsurprisingly passing a UTF-8 encoded string fails.  In most cases, this is a non-issue, as the process normally creating the job file writes paths that are ASCII, but there is the potential for oddities in the cases I mentioned above and for other applications creating the job file.

I suppose the answer is to encode the paths for the current code page on Windows when creating the job file.

In the longer term, if Intel Fortran had the Unicode character kind “ISO_10646”, like GNU Fortran (for example) we could use UTF-8 encoding.

0 Kudos
jimdempseyatthecove
Honored Contributor III
4,646 Views

This article might be informative.

And this thread might be informative (see post at 01-16-2012 03:16 AM by Karanta__Antti)

 

Jim Dempsey

Mark_Lewy
Valued Contributor I
4,637 Views
0 Kudos
Barbara_P_Intel
Employee
4,605 Views

Here's a shot in the dark! I'm no expert by far, but will this UNICODE routine help? MBConvertUnicodeToMB. Here's a summary of routines for National Language Support. These are Intel specials.

I investigated something similar for a customer a few years ago.

Their question:

If READ a unicode, multibyte file path (as a CHAR*) and pass it untouched to an OPEN statement, will it work on a Japanese OS or Korean OS or Chinese/Mandarin OS?

The solution:

subroutine test_file_open(filename,len)
USE IFNLS
integer :: len
!DIR$ ATTRIBUTES VALUE :: len
integer(2)::filename(len) ! array contains the Unicode file name
integer(4):: res
character*100:: ffname
res = MBConvertUnicodeToMB(filename,ffname) ! do the conversion, return the result string
length
write(*,*) ffname(1:res)
open (8, file=ffname(1:res), action='WRITE') ! pass result MB string to OPEN statement
write (8,*) 'Testing file writing'
close (8)
end subroutine

 

0 Kudos
andrew_4619
Honored Contributor III
4,600 Views

I played with National Language Support a few years back and found it was broken, I would bet no one has worked on in since that time...

0 Kudos
Mark_Lewy
Valued Contributor I
4,558 Views

Thanks Barbara, that's one way of doing this.

I found some other code of ours that was using the WideCharToMultiByte Windows API from the kernel32 module to convert Unicode (wchar_t) to multibyte.  This was using CP_UTF8 (UTF-8) except for strings that were going to be used as FILE specifiers in OPEN statements, which are converted with CP_ACP (ANSI code page).  So, I think I've answered my own question.

0 Kudos
Barbara_P_Intel
Employee
4,597 Views

A few years ago this sample worked.

0 Kudos
Steve_Lionel
Honored Contributor III
4,480 Views

I've successfully used a USEROPEN routine (DEC/Intel extension) to open a file using UTF-8 encoding in the file path. Sorry, I no longer have the example (unless it's in this forum somewhere), but it did work. I do wish Intel would get UTF-8 supported in Fortran, as other compilers already have.

Reply