- Als neu kennzeichnen
- Lesezeichen
- Abonnieren
- Stummschalten
- RSS-Feed abonnieren
- Kennzeichnen
- Anstößigen Inhalt melden
While debugging I noticed that OPEN statement fails to open a file when the file name contains special characters (in this case scandinavian ) encoded as utf-8. This is natural as the statement tries to interpret the string as ASCII and the byte sequence does not make sense that way.
Here's the statemet
[fortran] OPEN(FU,FILE=FILNAM,STATUS='NEW',ERR=10,IOSTAT=IOST) [/fortran]
The error raised is 29 (file not found).
The directory the file is tried to be created in exists but its name contains a special character.
Now the question is, how do I handle file names with special characters? I recieve the file name from C code encoded using utf-8.
Link kopiert
- Als neu kennzeichnen
- Lesezeichen
- Abonnieren
- Stummschalten
- RSS-Feed abonnieren
- Kennzeichnen
- Anstößigen Inhalt melden
i am not sure, but encoding='UTF-8' might help.
[bash]OPEN(FU,FILE=FILNAM,encoding='UTF-8',STATUS='NEW',ERR=10,IOSTAT=IOST)Frank
[/bash]
- Als neu kennzeichnen
- Lesezeichen
- Abonnieren
- Stummschalten
- RSS-Feed abonnieren
- Kennzeichnen
- Anstößigen Inhalt melden
- Als neu kennzeichnen
- Lesezeichen
- Abonnieren
- Stummschalten
- RSS-Feed abonnieren
- Kennzeichnen
- Anstößigen Inhalt melden
- Als neu kennzeichnen
- Lesezeichen
- Abonnieren
- Stummschalten
- RSS-Feed abonnieren
- Kennzeichnen
- Anstößigen Inhalt melden
There may be a multi-lingual code page that you can set Windows to use as well.
Maybe Latin(western European) code page 1252 ?
See http://en.wikipedia.org/wiki/Windows-1252
To find out your systems active code page use
chcp
from a DOS prompt
You can change the code page using
chcp 1252
from the same DOS prompt.
- Als neu kennzeichnen
- Lesezeichen
- Abonnieren
- Stummschalten
- RSS-Feed abonnieren
- Kennzeichnen
- Anstößigen Inhalt melden
Renaming the file is not an option. The problem is in the directory name and I can't tell the user not to use directories with letters from his native language. The directory with which this first came up is the Swedish Win XP's correspondent of Local Settings, something like "Lokla Instllningar".
Speaking of code pages, does OPEN accept a string encoded using the local windows code page? That would be a fairly straightforward string conversion.
- Als neu kennzeichnen
- Lesezeichen
- Abonnieren
- Stummschalten
- RSS-Feed abonnieren
- Kennzeichnen
- Anstößigen Inhalt melden
you can always change it back.
- Als neu kennzeichnen
- Lesezeichen
- Abonnieren
- Stummschalten
- RSS-Feed abonnieren
- Kennzeichnen
- Anstößigen Inhalt melden
Ok, got it to work by converting the utf-8 string to local windows code page (did not need to change it with chcp).
I used windows function MultiByteToWideChar to first convert the utf-8 string to utf-16 and thenWideCharToMultiByte to convert the utf-16 string to windows local code page. The OPEN statement seems to handle the file name with special chars just fine now.
- Als neu kennzeichnen
- Lesezeichen
- Abonnieren
- Stummschalten
- RSS-Feed abonnieren
- Kennzeichnen
- Anstößigen Inhalt melden
Is it permissable for you to post the essential details of the code that you developed successfully?
It would be much appreciated I'm sure.
- Als neu kennzeichnen
- Lesezeichen
- Abonnieren
- Stummschalten
- RSS-Feed abonnieren
- Kennzeichnen
- Anstößigen Inhalt melden
Sure, this code is in no way a trade secret, just "normal" fiddling with string conversions. I hope this saves someone else some time:
Here's the c functions:
[cpp]extern wchar_t* nosUTF8ToUTF16( const char * utf8string, OSInt nchars ) { int requiredSize, writtenSize ; wchar_t * result ; SetLastError( 0 ) ; requiredSize = 1 + MultiByteToWideChar( CP_UTF8, 0, utf8string, (int)nchars, NULL, 0 // if 0, func returns the size of the required buffer (in wchar_t) ) ; result = nosCAllocate( requiredSize, sizeof( wchar_t ) ) ; writtenSize = MultiByteToWideChar( CP_UTF8, 0, utf8string, (int)nchars, result, requiredSize ) ; assert( requiredSize == writtenSize + 1 ) ; result[ writtenSize ] = 0 ; assert( writtenSize == wcslen( result ) ) ; return result ; } extern char* nosUTF16ToLocalCodePage( const wchar_t * utf16string, OSInt nchars ) { int requiredSize, writtenSize ; char * result ; requiredSize = 1 + WideCharToMultiByte( CP_ACP, 0, utf16string, (int)nchars, NULL, 0, NULL, NULL ) ; result = nosCAllocate( requiredSize, 1 ) ; writtenSize = WideCharToMultiByte( CP_ACP, 0, utf16string, (int)nchars, result, requiredSize, NULL, NULL ) ; assert( writtenSize + 1 == requiredSize ) ; result[ writtenSize ] = 0 ; return result ; } extern char* nosUTF8ToLocalCodePage( const char * utf8string, OSInt nchars ) { char *local_string = NULL ; wchar_t* utf16tmp ; utf16tmp = nosUTF8ToUTF16( utf8string, nchars ) ; local_string = nosUTF16ToLocalCodePage( utf16tmp, -1 ) ; nosFree( utf16tmp ) ; return local_string ; } [/cpp]And here's the Fortran binding:
[fortran]MODULE XXXX INTERFACE ! length of null terminated string (for c interop) PURE INTEGER( KIND = C_SIZE_T ) FUNCTION OS_STRLEN( STR ) BIND( C, NAME = "strlen" ) USE, INTRINSIC :: ISO_C_BINDING TYPE( C_PTR ), INTENT(IN), VALUE :: STR END FUNCTION SUBROUTINE NOS_FREE( C_POINTER ) BIND( C, NAME = "nosFree" ) USE, INTRINSIC :: ISO_C_BINDING TYPE( C_PTR ), INTENT(IN), VALUE :: C_POINTER END SUBROUTINE FUNCTION NOS_UTF8_TO_LOCAL_CODE_PAGE( UTF8_STRING, NCHARS ) RESULT( RESULT_STRING_PTR ) BIND( C, NAME = "nosUTF8ToLocalCodePage" ) USE, INTRINSIC :: ISO_C_BINDING CHARACTER( KIND = C_CHAR ), INTENT(IN) :: UTF8_STRING(*) INTEGER, INTENT(IN), VALUE :: NCHARS TYPE( C_PTR ) :: RESULT_STRING_PTR END FUNCTION TYPE(C_PTR) FUNCTION OS_STRNCPY( TARGET_STRING, C_POINTER, N ) BIND( C, NAME = "memcpy" ) USE, INTRINSIC :: ISO_C_BINDING CHARACTER( KIND = C_CHAR ), INTENT(IN) :: TARGET_STRING(*) TYPE( C_PTR ), INTENT(IN), VALUE :: C_POINTER INTEGER( KIND = C_SIZE_T ), VALUE :: N END FUNCTION END INTERFACE CONTAINS FUNCTION OS_UTF8_TO_LOCAL_CODE_PAGE( STR ) RESULT( RESULT_STR ) USE, INTRINSIC :: ISO_C_BINDING CHARACTER( LEN = * ), INTENT(IN) :: STR CHARACTER( LEN = : ), ALLOCATABLE :: RESULT_STR TYPE( C_PTR ) :: CHAR_PTR, IGNORED INTEGER( KIND = C_INT ) :: STR_LENGTH IF ( STR == '' ) THEN RESULT_STR = STR ELSE CHAR_PTR = NOS_UTF8_TO_LOCAL_CODE_PAGE( STR, LEN( STR ) ) STR_LENGTH = OS_STRLEN( CHAR_PTR ) ALLOCATE( CHARACTER( LEN = STR_LENGTH ) :: RESULT_STR ) IGNORED = OS_STRNCPY( RESULT_STR, CHAR_PTR, INT( STR_LENGTH, C_SIZE_T ) ) ASSERT( C_ASSOCIATED( IGNORED, C_LOC( RESULT_STR ) ), '' ) CALL NOS_FREE( CHAR_PTR ) END IF END FUNCTION [/fortran]Except for our custom wrappers around calloc and free, typedef OSInt to match default integer size in Fortran and our port of C style ASSERT to Fortran the code does not use anything application specific and should be quite easily reusable by others.
- Als neu kennzeichnen
- Lesezeichen
- Abonnieren
- Stummschalten
- RSS-Feed abonnieren
- Kennzeichnen
- Anstößigen Inhalt melden
However, I was hoping it would all be in FORTRAN (i.e. use the Fortran wrappers for the multi-byte Windows API functions) !
However, can you please explain for a non-C programmer how it solved your failure to 'OPEN' with a Fortran filename string containing characters such as a umlaut when you did not appear to change the code page, but used CP_ACP in the C-code, which appears to just refer to an 'ANSI code page'. which I presume means the one already in use? What code page is set for your system?
What would happen if your code was used on a system set to use, say, Code page 850 if a file was found/supplied with a name containing a umlaut or similar?
- Als neu kennzeichnen
- Lesezeichen
- Abonnieren
- Stummschalten
- RSS-Feed abonnieren
- Kennzeichnen
- Anstößigen Inhalt melden
I did not use all Fortran as I find playing around with C functions a lot easier to do in C.
From http://support.microsoft.com/kb/108450
"CP_ACP instructs the API to use the currently set default Windows ANSI codepage."
My current code page:
C:\Users\nak>chcp
Active code page: 437
I worked under the assumption that OPEN wants the file name encoded using the current windows code page. It seems to work, even though I only tested with a few nonproblematic cases and a few that did not work previously.
If the local windows code page was such that the special characters in the original string could not be presented they would be replaced with a "default" char, thus making the path different from the intended. However, in my case it is enough that I can handle the paths on the user's file system and I am assuming the paths contain only characters that can be represented using the local windows code page. Seems like a reasonable assumption, although I'm not sure how Windows presents the file names internally.
- Als neu kennzeichnen
- Lesezeichen
- Abonnieren
- Stummschalten
- RSS-Feed abonnieren
- Kennzeichnen
- Anstößigen Inhalt melden
- Als neu kennzeichnen
- Lesezeichen
- Abonnieren
- Stummschalten
- RSS-Feed abonnieren
- Kennzeichnen
- Anstößigen Inhalt melden

- RSS-Feed abonnieren
- Thema als neu kennzeichnen
- Thema als gelesen kennzeichnen
- Diesen Thema für aktuellen Benutzer floaten
- Lesezeichen
- Abonnieren
- Drucker-Anzeigeseite