inquire(1,name=fname) probrem in MB

ivfjp · ‎01-16-2024

I have been asking for a bug fix through our agency for many years due to the Japanese file name issue, and one problem is still unresolved. It is also unclear whether the issue has been escalated by the Japanese agency.
Fix the bug when getting the open file name like inquire(1,name=s). It's been about 10 years since I requested support. The open statement seems to be fine.
Internally, it's a simple mistake of confusing the number of bytes and the number of characters. It's easy to fix.
It worked in the past (up to 15.0.0.108), and also works with other compilers such as gfortran and flang.
Both ifort and ifx no longer work after the bug occurred.
I have attached a test program that can detect the error when you try it on Windows 10/11.
With a compiler that works properly such as gfortran, or with an older compiler up to 15.0.0.108.
If you compare the operation with the current ifort and ifx, you will know what is wrong.

日本語ファイル名問題で代理店経由で長年バグ修正をお願いしている問題でまだひとつ未解決です。日本の代理店からエスカレーションしているかどうかも不明です。
inquire(1,name=s) のようにオープン済のファイル名を獲得するときのバグを修正してください。サポートを依頼してもう10年ほどになります。open文のほうは大丈夫そうです。
内部的にはバイト数と文字数の混同があるという単純なミスです。修正は容易です。
その昔(～15.0.0.108まで)は動いていましたし、gfortran, flangなど他のコンパイラでも動きます。
バグが発生して以降の ifort, ifx ともにダメです。
Windows10/11で試すと、当該エラーを検出できるテストプログラムを添付します。
gfortranなどの正しく動くコンパイラ、または15.0.0.108までの古いコンパイラと
現在のifort,ifxで動作を比較していただくと何がダメなのかわかります。

-- ivfjp.f90 --
program main
character(80),parameter :: fname='abc日本語.err'
character(80) :: s
write(*,*) 'IVF日本語問題の有無確認(簡易版)'
open(1,file=fname, status='unknown',iostat=ios)
write(*,*) 'write open: ios=', ios
write(1,'(a)') fname
inquire(1,name=s)
write(*,*) 'inquire:s=', trim(s)
ii = index(s, trim(fname))
if (ii <= 0) then
   write(*,*) '>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>'
   write(*,*) '>>>>>> コンパイラに日本語バグ問題あり <<<<<<<<<<<<'
   write(*,*) '<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<'
   write(*,*) " → 'abc日本語.err' でなく 'abc日本語'に出力してしまうなど"
else
! 簡易検査なので本格的には別途調査すること
end if
write(*,*) 's=', trim(s), ii
close(1)
s = 'xxx'
open(2,file=fname, status='unknown',iostat=ios)
write(*,*) 'read open: iostat=', ios
read(2,'(a)') s
! write(*,*) 'f=', trim(fname)
write(*,*) 'read:s=', trim(s)
close(2)
end program

mecej4 · ‎01-17-2024

I do not know any Japanese, and I am going to ignore any issues that may be related to character set, code page, etc.

I ran the test program using (a) Ifort 14.0.4 and (b) Ifx 2024.0.2, both on Windows 11, and redirected the outputs to files. The output files were byte-for-byte identical.

I then used Cygwin Gfortran 11.4. There were differences in the character set, but the main difference as compared to the outputs from the Intel-versions is that for INQUIRE(1, NAME=s) Gfortran puts just the file name and extension into 's', whereas Ifort/IVF prepend the drive letter and directory. This variation is allowed in the Fortran standard ( I am viewing the draft ISO/IEC DIS 1539-1:2017), where Note 12.59 states quite explicitly:

If this specifier appears in an INQUIRE by file statement, its value is not necessarily the same as the name
given in the FILE= specifier.
The processor could assign a file name qualified by a user identification, device, directory, or other relevant
information.

Given this latitude, it is necessary for the programmer to write code to process the returned string s whether or not it contains the driver letter, directory, username, userid, machine ID, etc., prepended to the file name proper.

JohnNichols · ‎01-17-2024

The issue appears to be that the read and write statements need to know what character set is being used, or in this case two character sets are being used.

The car character which is part of the Japanese word for Japan is translated as ASCII 230 which is ae character and Windows gets it right as ae but the terminal prints it as Greek u. The program does as it is told, it is just not told correctly for Japanese. Windows 11 knows how to name the file with the Japanese characters, the issue is Fortran.

I do not know how to read in the Japanese characters but the character string has to say it is ISO set such and such or you need a translation code in the Fortran. Arjen has written about this elsewhere, but my Fortran just do not extend to different ISO sets.

ivfjp · ‎01-17-2024

'日' means the sun.
'日' needs 2 bytes in Fortran/C/C++.

NO) character(1),parameter :: car ='日'
is bug.

FULL-WIDTH-CHARACTORs have a length of 2 or 3 bytes.
OK) character(2),parameter :: car ='日'

andrew_4619 · ‎01-17-2024

Intel Fortran does not support multibyte characters sets. There are ways of doing it, I have Unicode dialogs and menus in my windows application put you needs specific code and/or libraries to handle it.

ivfjp · ‎01-17-2024

Thank you everyone for your replies.

I'll add more information about the problem.

It conforms to the standard that the file name passed as a relative path becomes an absolute path. That's OK.
The problem is that the end of the file name is missing.
Full-width characters have a length of 2 or 3 bytes.
If you try changing the file name, you will find that you are dealing with the number of characters, not the number of bytes.

It's a simple mistake. If I had the source, I could fix it right away.

皆様、返信ありがとうございます。

問題について補足します。

相対パスで渡したファイル名が、絶対パスになるのは規格に適合しています。それはOKです。
問題は、ファイル名の末尾が欠落する現象です。
全角文字は2バイトもしくは3バイトの長さを持ちます。
ファイル名を変えながら試していくと、バイト数でなく文字数で扱っていることがわかります。

単純なミスです。ソースがあればすぐに直せるのに。

f1 = 'abcdefghi.err' ! without FULL-WIDTH-CHARACTER
f2 = 'abcXXYYZZ.err' ! with FULL-WIDTH-CHARACTER

a,b,c are HALF-WIFTH-CHARACTER (=ascii character)
XX,YY,ZZ are FULL-WIDTH-CHARACTER (=Japanese MB character)

open(1, f1, ..); open(2, f12 ..)
inquire(1, opend=b1); inquire(2, opend=b2)
b1 => .true.
b2 => .true.

inquire(1, name=fname1)
inquire(2, name=fname2)
fname1 => 'c:\xx\xx\xx\abcdefghi.err'
fname2 => 'c:\xx\xx\xx\abcXXYYZZ' # bug

JohnNichols · ‎01-18-2024

program main
character(80),parameter :: fname='abc日本語.err'
character(80) :: s
write(*,*) 'IVF日本語問題の有無確認(簡易版)'
open(1,file=fname, status='unknown',iostat=ios)
   write(*,*) 'write open: ios=', ios
   write(1,'(a)') fname
   inquire(1,name=s)
   write(*,*) 'inquire:s=', trim(s)
   ii = index(s, trim(fname))
   if (ii <= 0) then
      write(*,*) '>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>'
      write(*,*) '>>>>>> コンパイラに日本語バグ問題あり <<<<<<<<<<<<'
      write(*,*) '<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<'
      write(*,*) " → 'abc日本語.err' でなく 'abc日本語'に出力してしまうなど"
   else
! 簡易検査なので本格的には別途調査すること
   end if
   write(*,*) 's=', trim(s), ii
close(1)
s = 'xxx'
open(2,file=fname, status='unknown',iostat=ios)
   write(*,*) 'read open: iostat=', ios
   read(2,'(a)') s
! write(*,*) 'f=', trim(fname)
   write(*,*) 'read:s=', trim(s)
close(2)
end program

Thank you for your kind explanation. 親切な説明をありがとうございました。

As you seem to be doing, I use google translate to attempt to understand your Japanese characters and words.

あなたと同じように、私も Google 翻訳を使ってあなたの日本語の文字や単語を理解しようとしています。

As we can both see there is no translation in the Japanese dictionary for Google. ご覧のとおり、Googleの日本語辞書にはgoogleの翻訳がありません。

program main
character(80),parameter :: fname='abc日本語.err'
character(2), parameter :: car ='日' 
character(80) :: s
write(*,*) 'IVF日本語問題の有無確認(簡易版)'
write(*,*)car
open(1,file=fname, status='unknown',iostat=ios)
   write(*,*) 'write open: ios=', ios
   write(1,'(a)') fname
   inquire(1,name=s)
   write(*,*) 'inquire:s=', trim(s)
   ii = index(s, trim(fname))
   if (ii <= 0) then
      write(*,*) '>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>'
      write(*,*) '>>>>>> コンパイラに日本語バグ問題あり <<<<<<<<<<<<'
      write(*,*) '<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<'
      write(*,*) " → 'abc日本語.err' でなく 'abc日本語'に出力してしまうなど"
   else
! 簡易検査なので本格的には別途調査すること
   end if
   write(*,*) 's=', trim(s), ii
close(1)
s = 'xxx'
open(2,file=fname, status='unknown',iostat=ios)
   write(*,*) 'read open: iostat=', ios
   read(2,'(a)') s
! write(*,*) 'f=', trim(fname)
   write(*,*) 'read:s=', trim(s)
close(2)
end program

It would be much easier to read your posts if you could include the Fortran code in the standard box. It is inserted with the symbol </> on the second line of the main menu.

標準ボックスに Fortran コードを含めることができれば、投稿はさらに読みやすくなります。これは、メインメニューの 2 行目に </> という記号とともに挿入されます。

Now Fortran does not appear to have a Japanese word or character set in Japanese. 現在のところ、Fortran という言葉には日本語の単語や文字セットを使用した表現はないようです。

In looking at your original code I was trying to understand the problem. The two symbols I noticed where 日本, pronounced Nihon. 元のコードを見て、問題を理解しようとしました。彼の「ジャパン」と発音される記号が 2 つあることに気づきました。

I was using the first symbol 日 as an example, so my English said that it was the first Kanji symbol in the representation of the English word Japan. 私は例として最初の記号「日」を使用したので、これが英語で「日本」を表す最初の漢字記号であると英語で言いました。

If we have two symbols, say AB, then we can mathematically represent four word ideas, 2 つの記号 (例: AB) がある場合、4 つの単語のアイデアを数学的に表すことができます。

A = SUN A = 太陽,

but now Google says there are other groupings for sun, just as the Eskimo have 27 words for snow. しかし、エスキモーが現在雪を表す27の単語を持っているのと同じように、Googleは太陽を表す別の漢字グループがあると述べている。

B = BOOK B = 本

AB = Japan and AB = 日本と and at this stage I can guess that と means and. この段階では、それが意味することは推測できますが、

BA = Today.

The problem with language is that people us idiom, so something says something but means something else. 言語の問題は、何かが何かを言っているにもかかわらず、別の意味になるように人々が慣用句を使用することです。

But here we are interested in just the Kanji, and in 1987 the ISO committee was asked to add Kanji according to the ISO site. ただし、ここでは漢字だけに焦点を当てます。 1987 年に ISO 委員会は ISO サイトに従って漢字を追加するよう依頼されました。

It appears that this is not done, which is hard to understand from Intel given the use of Fortran in Japan. これは起こっていないようですが、日本での Fortran の使用を考えると、Intel から理解するのは困難です。

For that I apologize. それについてはお詫び申し上げます。

I am not sure this board can do naught except hope that the Intel people and the ISO people who are part of this group will listen, after all 1987 is only 35 years ago. この委員会が何かできるかどうかはわかりませんが、このグループの一員であるインテル関係者と ISO 関係者が耳を傾けてくれることを願っています。1987 年はわずか 35 年前のことですから。

There is a way around the issue, but it is a huge kludge and a lot of coding. この問題を回避する方法はありますが、それには多大な作業と大量のコーディングが必要です。

Finally, Google fixes my English, I left the fix in the Japanese translation. 最後に、Google は私の英語を修正し、その修正を日本語翻訳に残しました。

Old English has the same problem, in Old English the word for gun is firestick. 同じ問題は古英語にも存在し、銃を表す単語は fire Stick です。

Google could not translate fire stick, it changed it to stick of fire, which is not the same. Google は Fire Stick を翻訳できず、「stick of fire」に変更しましたが、これは同じではありません。 (火の棒)

Looking forward to your next thoughts. 次回もぜひご意見をお待ちしております。

私には中国人の娘がいますが、彼女は私が大きな負担だと言います。ここにいなければよかったのに。

Barbara_P_Intel · ‎01-18-2024

I did some research a few years ago into unicode file names at a customer's request. Attached is what I learned and demonstrated with a colleague in China. Perhaps this will help.

andrew_4619 · ‎01-18-2024

My experience with IFNLS a few years back was that is was very broken. I gave up and wrote mu own routines to do what I wanted based on storing strings as integer(2) arrays and using windows SDK routines directly (W rather than A(NSI) variants).

JohnNichols · ‎01-19-2024

Do you have the Fortran code outside of a PDF file?

ie the original as a text file.

Barbara_P_Intel · ‎01-22-2024

Here you go, @JohnNichols.

subroutine test_file_open(filename,len)
USE IFNLS
integer :: len
!DIR$ ATTRIBUTES VALUE :: len
integer(2)::filename(len)      ! array contains the Unicode file name
 
integer(4):: res
character*100:: ffname 
 
res = MBConvertUnicodeToMB(filename,ffname)      ! do the conversion, return the result string length
write(*,*) ffname(1:res) 
open (8, file=ffname(1:res), action='WRITE')      ! pass result MB string to OPEN statement
write (8,*) 'Testing file writing'
close (8)
end subroutine

ivfjp · ‎01-18-2024

Thank you everyone for all your answers.

I have also included a translation so that people who speak Japanese or Chinese can see it.

Fortran character(len=1) does not need to represent a single FULL-WIDTH-CHARACTER character.
len=n is the number of bytes needed, not necessarily the number of characters.
It simply allocates the necessary buffer.

This bug is simply a problem where the return value of name=xxxx is truncated by the number of characters instead of the number of bytes.

I don't want Fortran spec extensions.
I would like to solve a bug that made it impossible to do something that was normally possible with Intel Fortran. It's a bugfix.
I think that people in English speaking countries, who get by with ascii characters, don't understand the difference between the number of characters and the number of bytes.

Previously, it took several years to fix the file name problem in the open statement.
This time it's about inquire(1,name=xxxx). This is all that remains.

It's confusing, but I hope you understand the problem.

andrew_4619 · ‎01-19-2024

I understand the problem however what you are describing is not a bug to be fixed. You are asking for an extension to Intel Fortran to work natively work with multibyte characters. The example you picked is but just one aspect of this there are many others.

ivfjp · ‎01-21-2024

It seems difficult for English-speaking people to understand the problem of multibyte characters in East Asia.
Even among the more knowledgeable people who answered here, many of them thought it was a problem with advanced string processing.

The problem this time is a bug in inquire(1,name=xxx). It occurred in 2015.
A bug in the open statement also occurred at the same time. We considered various countermeasures, but in the end, the bug in the open statement was finally fixed in 2023.

I also reported the issue with inquire at the same time, but it turned out that the Japanese agency had not escalated the issue to Intel.
If the agency had escalated the issue to Intel, I think it would have been fixed at the same time as the OPEN statement.

The problem with inquire is a simple matter of ``confusing the number of characters and the number of bytes.'' Out of string buffer.

It's Fortran I don't intend to use it for advanced string processing.
Unlike the OPEN statement, I think it is possible to avoid this in practical terms by changing inquire(1,name=xxx) to remember the file name by itself.

I would like Intel developers to be aware of the problem.

JohnNichols · ‎02-17-2024

abc日本語.err
abc???.err

121212 121211 121212 121213

616263 E697A5 E69CAC E8AA9E 2E 657272
0D0A 616263 3F 3F 3F 2E 657272

If I use UTF-8 then line one is the original as shown in Notepad++, which copes with your Japanese characters, but line 2 is the VEDIT editor, the hex from both of those editors is shown in LINE 4 and 5. Line 3 counts the HEX codes. The 0D0A is crap. The Japanese characters correspond to 3 hex codes each giving you the 16 total.

I had no trouble understanding that Fortran treats the input as VEDIT whereas you want Notepad++.

Only Intel can fix that. This text editor handles the Japanese characters.

日 = E697A5

本 = E69CAC

語 = E8AA9E

I have no idea where the E8AA9E or the other sets are defined, sorry.

@Barbara_P_Intel , this is why you get 13 and not 16, it is the reader in Fortran not handling the codes.

ivfjp · ‎02-18-2024

Japanese kanji character such as '日' use 2 to 3 bytes per character.

The number of bytes required for one character changes depending on the character code (utf-8, utf-16, cp932).

日 (U+65E5) are:

utf-8: e6 97 a5 (3 bytes required)
utf-16: 65e5 (2 bytes required)
cp932: 8e 9f (2 bytes required)

inquire(1,name=xxx)

In name=xxx is output parameter.
It is necessary to output the number of bytes without cutting it by the number of characters.
There is nothing the caller can do about it.

What used to be OK turned out to be a bug midway through.

Barbara_P_Intel · ‎01-23-2024

I filed a bug report, CMPLRLIBS-34816, on the INQUIRE( ).

I have a question because I'm curious. The reproducer prints the length of file name as 13. @ivfjp, what do you expect the length to be?

ivfjp · ‎02-15-2024

character(80),parameter :: fname='abc日本語.err'

(except for the path part)
The number of characters in Japanese is 10
The required number of bytes is 13 bytes (CP932) or 16 bytes (UTF-8).

inquire(1,name=xxx)
There is a bug that the change has been changed so that it is cut by the number of characters instead of the number of bytes required.
That's what it means to have a confusion between the number of characters and the number of bytes.

Barbara_P_Intel · ‎02-16-2024

Thank you. I forwarded the information to the compiler engineer.

ivfjp · ‎02-18-2024

Thank you. Thanks for your cooperation.

By the way, it is OK up to the version displayed as Ver15.0.108. Then I started feeling sick.

Intel(R) Visual Fortran Compiler XE for applications running on IA-32 version 15.0.0.108 build 20140726

The OPEN statement bug has been fixed, but the INQUIRE bug still exists.

ivfjp · ‎02-18-2024

If possible, I would like you to fix not only ifx but also ifort. then I will be very happy!