Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
Announcements
FPGA community forums and blogs on community.intel.com are migrating to the new Altera Community and are read-only. For urgent support needs during this transition, please visit the FPGA Design Resources page or contact an Altera Authorized Distributor.

CSV delimiter detection

nvaneck
New Contributor I
6,784 Views

Is there anyway to accurately determine the delimiter used for CSV files in non-US countries?  There must be some way, as Excel knows what to do, but I can't find any way to determine in from NLSGETLOCALEINFO. Currently I try using the decimal delimiter from NLSGETLOCALEINFO, but I still need to try what comes back from NLSGETLOCALEINFO, and then try a ; if it fails. This is not a failsafe way to do it....

The problem arises when importing a csv file created by Excel and I need to know what to scan for.

 

Thanks,
Neal Van Eck

0 Kudos
1 Solution
Steven_L_Intel1
Employee
6,784 Views

Ok - the piece of information you want is LOCALE_SLIST from the current locale.

[fortran]
    program ListSep   
    use kernel32
    implicit none

    ! Variables
    character(10) ::ls
    integer :: lslen, i
   
    lslen = GetLocaleInfo (LOCALE_USER_DEFAULT, LOCALE_SLIST, ls, len(ls))
    ! Returns number of characters
   
    print '(*(A))','List separator is "',ls(1:lslen-1),'"'   
    end program ListSep
[/fortran]

This program might need tweaking if there is a multicharacter list separator (can be up to four characters!) Also, the function returns them as TCHARs and these might be double-byte. But I think this should work most of the time.

View solution in original post

0 Kudos
15 Replies
rase
New Contributor I
6,784 Views

In Excel the default local settings for the csv separator character can be overridden by the user. The csv file is a plain text file without information about the origin, local settings and separator character which could be interrogated by a function. The simple solution is to have a look at the file content and guess from the text which character is the separator.

0 Kudos
nvaneck
New Contributor I
6,784 Views

Yes, I know that about CSV files, but since the Excel file in question was just created by a call to Excel from my program, it must have some way to determine what separator to use, and so why can't I use the same method, whatever it is, to then interpret the just created csv file? At least that was what I hoped was possible. Other, non-Fortran, systems seem to be able to do this.

 

0 Kudos
andrew_4619
Honored Contributor III
6,784 Views

It is the default list separator set in the regional settings. In some countries it is set to ; (semi colon) you can change it to whatever you want. If you are reading the file only in excel put the text sep=, on line one and excel will read it OK so long as it is a .csv file type.

 

 

0 Kudos
nvaneck
New Contributor I
6,784 Views

That's encouraging, if there is a way to retrieve the regional default list setting.  Other suggestions don't help because of the application I need:

I let users click on a button for data entry with Excel, which opens a template with instruction for entering data field s characteristics and rows of records of data values. After Excel is closed with the file saved in CSV format, I open it and process the file. Thus there is no way to a priori  know what the separator is. What I need is a way to retrieve the regional list separator setting. 

Does anyone know how to retrieve this, or temporarily set it?  I'm searching on the internet, but nothing found so far outside of Excel, Java and VBA.

Thanks

0 Kudos
andrew_4619
Honored Contributor III
6,784 Views

Why don't you let Excel save with whatever separator it wants to and then just detect this character when you read the file? If you have some fixed text in A1 and A2 you just need to read line 1 as text and find the character that separates after removing any white space padding.

0 Kudos
andrew_4619
Honored Contributor III
6,784 Views

By the way I never found any way of determining the character by window locale enquiry, there doesn't seem to be part of the locale even though it is in the regional and language settings.

A similar problem hit me last year. I had a 'database' saved as CSV that my Fortran code read and used. A user could edit it externally using Excel. In some windows machines in some countries the CSV file did not read properly into Excel because it expected a semi-colons. My first fix was to add "sep=," to line one of the CSV file. This reads the file OK into Excel put fails when the user later saves the file as CSV as Excel fills it with semicolons not commas. In the end I chose to ditch CSV and do do all the reading/writing direct to the native xls or xlsx file as this is easier for the user. I could have opted for a solution like in my previous post #6 which would have been quicker and easier.

 

 

0 Kudos
nvaneck
New Contributor I
6,784 Views

I'm doing what app4619 suggests now but was hoping for an exact way  from the users settings.  This works, but another user option I have is to save an output report file as a csv file. At least in that case I can provide instructions when needed for viewing in Excel, or perhaps work something else out.

Thanks all

0 Kudos
Steven_L_Intel1
Employee
6,784 Views

I think you're looking at it the wrong way. The field delimiter Excel uses has nothing to do with locale, as far as I know. Even when you import a text file, you have to tell Excel what the delimiter is - typically comma or tab. Does it really behave differently in other locales? Excel typically will enclose values in quotes if any contain the delimiter.

0 Kudos
andrew_4619
Honored Contributor III
6,784 Views

@ Steve. Yes it does change with locale (see attached screen grab) it is the list separator. However even though Windows sets this as a location preference it does not appear to be settable or gettable using the locale tools.

This setting applies to the import and export of files with the .CSV extension where as far as excel goes the C in CSV does not mean comma it means the list separator from regional settings! 

http://office.microsoft.com/en-gb/excel-help/import-or-export-text-txt-or-csv-files-HP010342598.aspx

The above link mostly helps but a slightly missing some bits off relevant information.

0 Kudos
andrew_4619
Honored Contributor III
6,784 Views

As an addendum http://support.microsoft.com/kb/266200/en-us is also relevant,

0 Kudos
nvaneck
New Contributor I
6,784 Views

The best information I found on the internet over the years was  the statement:

    In language where the decimal separator is a ,(comma) the CSV separator is a ; (semicolon)

which is what I had been using since I can get the decimal separator, but recently this didn't work for a user in Switzerland, which is why I started this thread in hopes someone knew how to get the local list separator.  There are a lot of queries on the internet for how to do this, spanning many years, but no one seems to have a solution.

Scanning the known structure of the file created by my data entry with Excel function solves the problem for me, and for importing other csv files I allow the user to specify the delimiter. Saving a report file in CSV format still is an awkward problem without knowing the actual delimiter used locally (at least for Swiss users),

It just seems strange that there is no way to get the local list separator within Fortran programmatically.

 

0 Kudos
Steven_L_Intel1
Employee
6,785 Views

Ok - the piece of information you want is LOCALE_SLIST from the current locale.

[fortran]
    program ListSep   
    use kernel32
    implicit none

    ! Variables
    character(10) ::ls
    integer :: lslen, i
   
    lslen = GetLocaleInfo (LOCALE_USER_DEFAULT, LOCALE_SLIST, ls, len(ls))
    ! Returns number of characters
   
    print '(*(A))','List separator is "',ls(1:lslen-1),'"'   
    end program ListSep
[/fortran]

This program might need tweaking if there is a multicharacter list separator (can be up to four characters!) Also, the function returns them as TCHARs and these might be double-byte. But I think this should work most of the time.

0 Kudos
nvaneck
New Contributor I
6,784 Views

Works great Steve,

I only knew about NLSGETLOCALEINFO, which doesn't seem to be able to retrieve the list separator.

Thanks!

0 Kudos
andrew_4619
Honored Contributor III
6,784 Views

@Steve. That is quite useful thanks, I must have missed that when I scanned through the MSDN pages, there is a lot of info there! 

@nvaneck,  I have found one or two things lacking in NLS in ifort which is only a partial implementation of all the options available. Also if you use the sdk direct not all the routines have interfaces under module KERNEL32 or the superset  IFWIN . There were newer things added for Vista onwards.

0 Kudos
Steven_L_Intel1
Employee
6,784 Views

Oh, ignore the declaration of i in that code - it's from an intermediate version I was playing with.

0 Kudos
Reply