- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
While debugging I noticed that OPEN statement fails to open a file when the file name contains special characters (in this case scandinavian ) encoded as utf-8. This is natural as the statement tries to interpret the string as ASCII and the byte sequence does not make sense that way.
Here's the statemet
[fortran] OPEN(FU,FILE=FILNAM,STATUS='NEW',ERR=10,IOSTAT=IOST) [/fortran]
The error raised is 29 (file not found).
The directory the file is tried to be created in exists but its name contains a special character.
Now the question is, how do I handle file names with special characters? I recieve the file name from C code encoded using utf-8.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
i am not sure, but encoding='UTF-8' might help.
[bash]OPEN(FU,FILE=FILNAM,encoding='UTF-8',STATUS='NEW',ERR=10,IOSTAT=IOST)Frank
[/bash]
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
There may be a multi-lingual code page that you can set Windows to use as well.
Maybe Latin(western European) code page 1252 ?
See http://en.wikipedia.org/wiki/Windows-1252
To find out your systems active code page use
chcp
from a DOS prompt
You can change the code page using
chcp 1252
from the same DOS prompt.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Renaming the file is not an option. The problem is in the directory name and I can't tell the user not to use directories with letters from his native language. The directory with which this first came up is the Swedish Win XP's correspondent of Local Settings, something like "Lokla Instllningar".
Speaking of code pages, does OPEN accept a string encoded using the local windows code page? That would be a fairly straightforward string conversion.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
you can always change it back.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Ok, got it to work by converting the utf-8 string to local windows code page (did not need to change it with chcp).
I used windows function MultiByteToWideChar to first convert the utf-8 string to utf-16 and thenWideCharToMultiByte to convert the utf-16 string to windows local code page. The OPEN statement seems to handle the file name with special chars just fine now.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Is it permissable for you to post the essential details of the code that you developed successfully?
It would be much appreciated I'm sure.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sure, this code is in no way a trade secret, just "normal" fiddling with string conversions. I hope this saves someone else some time:
Here's the c functions:
[cpp]extern wchar_t* nosUTF8ToUTF16( const char * utf8string, OSInt nchars ) { int requiredSize, writtenSize ; wchar_t * result ; SetLastError( 0 ) ; requiredSize = 1 + MultiByteToWideChar( CP_UTF8, 0, utf8string, (int)nchars, NULL, 0 // if 0, func returns the size of the required buffer (in wchar_t) ) ; result = nosCAllocate( requiredSize, sizeof( wchar_t ) ) ; writtenSize = MultiByteToWideChar( CP_UTF8, 0, utf8string, (int)nchars, result, requiredSize ) ; assert( requiredSize == writtenSize + 1 ) ; result[ writtenSize ] = 0 ; assert( writtenSize == wcslen( result ) ) ; return result ; } extern char* nosUTF16ToLocalCodePage( const wchar_t * utf16string, OSInt nchars ) { int requiredSize, writtenSize ; char * result ; requiredSize = 1 + WideCharToMultiByte( CP_ACP, 0, utf16string, (int)nchars, NULL, 0, NULL, NULL ) ; result = nosCAllocate( requiredSize, 1 ) ; writtenSize = WideCharToMultiByte( CP_ACP, 0, utf16string, (int)nchars, result, requiredSize, NULL, NULL ) ; assert( writtenSize + 1 == requiredSize ) ; result[ writtenSize ] = 0 ; return result ; } extern char* nosUTF8ToLocalCodePage( const char * utf8string, OSInt nchars ) { char *local_string = NULL ; wchar_t* utf16tmp ; utf16tmp = nosUTF8ToUTF16( utf8string, nchars ) ; local_string = nosUTF16ToLocalCodePage( utf16tmp, -1 ) ; nosFree( utf16tmp ) ; return local_string ; } [/cpp]And here's the Fortran binding:
[fortran]MODULE XXXX INTERFACE ! length of null terminated string (for c interop) PURE INTEGER( KIND = C_SIZE_T ) FUNCTION OS_STRLEN( STR ) BIND( C, NAME = "strlen" ) USE, INTRINSIC :: ISO_C_BINDING TYPE( C_PTR ), INTENT(IN), VALUE :: STR END FUNCTION SUBROUTINE NOS_FREE( C_POINTER ) BIND( C, NAME = "nosFree" ) USE, INTRINSIC :: ISO_C_BINDING TYPE( C_PTR ), INTENT(IN), VALUE :: C_POINTER END SUBROUTINE FUNCTION NOS_UTF8_TO_LOCAL_CODE_PAGE( UTF8_STRING, NCHARS ) RESULT( RESULT_STRING_PTR ) BIND( C, NAME = "nosUTF8ToLocalCodePage" ) USE, INTRINSIC :: ISO_C_BINDING CHARACTER( KIND = C_CHAR ), INTENT(IN) :: UTF8_STRING(*) INTEGER, INTENT(IN), VALUE :: NCHARS TYPE( C_PTR ) :: RESULT_STRING_PTR END FUNCTION TYPE(C_PTR) FUNCTION OS_STRNCPY( TARGET_STRING, C_POINTER, N ) BIND( C, NAME = "memcpy" ) USE, INTRINSIC :: ISO_C_BINDING CHARACTER( KIND = C_CHAR ), INTENT(IN) :: TARGET_STRING(*) TYPE( C_PTR ), INTENT(IN), VALUE :: C_POINTER INTEGER( KIND = C_SIZE_T ), VALUE :: N END FUNCTION END INTERFACE CONTAINS FUNCTION OS_UTF8_TO_LOCAL_CODE_PAGE( STR ) RESULT( RESULT_STR ) USE, INTRINSIC :: ISO_C_BINDING CHARACTER( LEN = * ), INTENT(IN) :: STR CHARACTER( LEN = : ), ALLOCATABLE :: RESULT_STR TYPE( C_PTR ) :: CHAR_PTR, IGNORED INTEGER( KIND = C_INT ) :: STR_LENGTH IF ( STR == '' ) THEN RESULT_STR = STR ELSE CHAR_PTR = NOS_UTF8_TO_LOCAL_CODE_PAGE( STR, LEN( STR ) ) STR_LENGTH = OS_STRLEN( CHAR_PTR ) ALLOCATE( CHARACTER( LEN = STR_LENGTH ) :: RESULT_STR ) IGNORED = OS_STRNCPY( RESULT_STR, CHAR_PTR, INT( STR_LENGTH, C_SIZE_T ) ) ASSERT( C_ASSOCIATED( IGNORED, C_LOC( RESULT_STR ) ), '' ) CALL NOS_FREE( CHAR_PTR ) END IF END FUNCTION [/fortran]Except for our custom wrappers around calloc and free, typedef OSInt to match default integer size in Fortran and our port of C style ASSERT to Fortran the code does not use anything application specific and should be quite easily reusable by others.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
However, I was hoping it would all be in FORTRAN (i.e. use the Fortran wrappers for the multi-byte Windows API functions) !
However, can you please explain for a non-C programmer how it solved your failure to 'OPEN' with a Fortran filename string containing characters such as a umlaut when you did not appear to change the code page, but used CP_ACP in the C-code, which appears to just refer to an 'ANSI code page'. which I presume means the one already in use? What code page is set for your system?
What would happen if your code was used on a system set to use, say, Code page 850 if a file was found/supplied with a name containing a umlaut or similar?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I did not use all Fortran as I find playing around with C functions a lot easier to do in C.
From http://support.microsoft.com/kb/108450
"CP_ACP instructs the API to use the currently set default Windows ANSI codepage."
My current code page:
C:\Users\nak>chcp
Active code page: 437
I worked under the assumption that OPEN wants the file name encoded using the current windows code page. It seems to work, even though I only tested with a few nonproblematic cases and a few that did not work previously.
If the local windows code page was such that the special characters in the original string could not be presented they would be replaced with a "default" char, thus making the path different from the intended. However, in my case it is enough that I can handle the paths on the user's file system and I am assuming the paths contain only characters that can be represented using the local windows code page. Seems like a reasonable assumption, although I'm not sure how Windows presents the file names internally.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page