Format change for unformatted data files?

ferrad01 · ‎12-17-2010

Were having a problem reading a file MENU.FLB. It works fine under Digital Fortran, but Intel fails reading it. Maybe Intel has changed the format of unformatted files? Here is the file plus my test program (menu.for) which also fails reading it.

Looking at http://www.erdc.hpc.mil/documentation/Tips_Tricks/unformattedFiles , this file (menu.flb) does not follow the format they describe. I created a small program (menu1.for) to create an FLB file with similar data, and it creates it in the format described on the website.

Could the format of unformatted files (this sounds like an oxymoron) have changed between Dec and Intel?

If so how do I read a (Dec) unformatted file with Intel 11.1?

Steven_L_Intel1 · ‎12-17-2010

The format has not changed. Are you sure your program is correct in declaring NVAR to be INTEGER*1? The data file has a 4-byte integer there. Also, the second record is just the 8-character string, but your program wants to read NVAR again.

mecej4 · ‎12-17-2010

The hpc.mil link that you listed talks about using 2-byte record length prefixes and suffixes. The majority of compilers in use today use 4-byte length indicators.

Be that as it may, in the Fortran context "unformatted" means "written without a FORMAT statement" rather than "completely devoid of format".

The intended usage of unformatted files is to write and read them using the same system and using matching READ/WRITE statement pairs. Since different I/O statements may involve record lengths that vary considerably, it is nearly impossible to read an unformatted file if the contents of the file -- what types of variables, and how many of them were written -- are not known

jimdempseyatthecove · ‎12-18-2010

You may need to open the file in BINARY mode, perform a few test reads to identify embedded record length/boundary identifiers (if any), and discover endianness of variables as well as a potential for incompatible floating point formats (e.g. PDP-10 may have used 36-bit floats).

Start by reading into a binary buffer that is several times larger than what you assume is the record length. This will make identifying the record markers(usualy a 1, 2, 4 byte length value, but could be other format such as forward and backward recored length (two) markers).

Once you have identified the data structure, writing a conversion routine should be relatively easy.

Jim Dempsey

ferrad01 · ‎12-19-2010

Sorry Steve the file I attached was one in transition, here is a better menu.for. It still fails on the read.

ferrad01 · ‎12-19-2010

Thanks Jim. This data file is read with the code in the modified menu.for (attached in later post) compiled under CVF 6.6, however it fails when compiled with Intel 11.1.

jimdempseyatthecove · ‎12-20-2010

ferrad01,

I am unable to download the menu.zip file (I assume menu.flb is in there), this may have been deleted from the ISN site. I do have your menu.for file.

What is the characteristics of thefailure with 11.1?

a) file open error
b) read error
c) NVAR incorrect, PNLNAM correct
d) NVAR correct, PNLNAM has junk as first bytes
e) NVAR incorrect, PNLNAM has junk as first bytes
f) NVAR correct, PNLNAM blank
g) NVAR incorrect, PNLNAM missing some characters
h) other...

c ?= endianness incorrect
d ?= size of the integer for NVAR incorrect
e ?= endianness incorrect + size of the integer for NVAR incorrect
f ?= your data file may have a record break between NVAR and PNLNAM (or record size on 1st record of size of NVAR)
g ?= integer size of NVAR too large

Just guessing here

Jim Dempsey

ferrad01 · ‎12-20-2010

Jim,

Here is the MENU.FLB file. It fails on the read:

1 READ (31, ERR = 120) NVAR,PNLNAM

ie.

D:\test\menu>menu
read error

Adrian

ferrad01 · ‎12-20-2010

Steve / Jim,

We have found the problem. Turns out these FLB files were created with Fortran Powerstation, and the structure of unformatted files has changed since then (see CVF Help /:ioformat).

Another foible is that CVF switches this backwards compatibility on from the command line but not from the GUI environment, so it fails in DevStudio but not with fl32 at the command line.

Intel 11.1 fails in both command line and DevStudio as this option is not set for either. If I turn it on it works.

So my question is: is there a utility I can use to convert my PowerStation unformatted data files to Intel format?

Adrian

Steven_L_Intel1 · ‎12-20-2010

The file didn't look like a FPS file to me, but...

It is not true that CVF used this by default, but if you used /fpscomp:ioformat on the command line then you would get it.

There isn't a conversion utility I know of, but it could be written. Let me comment that this is not an "Intel" format - it is the most common layout used on Windows and UNIX/Linux (and Mac OS) for Fortran compilers from most vendors.

jimdempseyatthecove · ‎12-20-2010

Adrian,

In examining the menu.flb file, it appears to be a database or help file for ScaleChem OLI (www.olisystems.com) +1-(201)539-4996. You might want to contact them for file format information and to see if you are withinyour/their licensing agreement

The simplified format you are using in your menu.for program does not conform to the internal data format I can observe from the menu.flb file. I have not performed a complete analysis of this file. As far as the text strings go it appears to be:

header record
integer(1) :: lengthHeader
integer(1) :: headerPadd
char(len=headerPadd) :: blanks
char(len=8) :: name ! included in lengthHeader
integer(1), dimension(lengthHeader-8) :: probablyVersionEtc...InBinary

Next Record
integer(1) :: length
character(5) :: otherStuffUsuallyBlanksSometimesLastCharIsCode
char(len=length) :: text
character(len=11) :: otherStuff

Next Record
integer(1) :: length
character(5) :: otherStuffUsuallyBlanksSometimesLastCharIsCode
char(len=length) :: text
character(len=11) :: otherStuff
...

As stated earlier, please consult with OLI.

Jim Dempsey

ferrad01 · ‎12-29-2010

Jim,

I work for OLI Systems... We are upgrading our products (including Scalechem) from CVF 6.6 to Intel 11.1. Things are progressing fairly smoothly but we are now having problems with the structure of unformatted files. It is apparent that the format has changed between these compilers, hence Intel 11 cannot read unformatted files generated under CF6.6 unless /fpscomp:ioformat is used. I'd rather not keep this old feature in our current projects going forward, so I'd ideally like to have some sort of conversion program which converts the old format to the new.

Adrian

Steven_L_Intel1 · ‎12-29-2010

Adrian,

It is not the case that the formats changed - they did not. If you have examples of where you think the format changed, please provide details so we can investigate.

jimdempseyatthecove · ‎12-29-2010

Adrian,

The data present in your supplied unformatted file (presumably written by the older version software) did not have a layout as described by your new program read statements. The size of your INTEGERS were wrong and the header "record" is of different size than the remaining records. Yet your code appeared to read same sized records (with incorrect varible sizes).

As to why this code may have worked before is....

someone copied text from an old program,and pasted it into a new program without regard to actual data format.

An alternative situation is potentially:

Your old program had COMMONs that multiply mapped a named common in different ways. Someone updated the program to use MODULES and picked only one of the COMMON records to represent all the data records in your database. (The other record layouts may be in your MODULES as seperate entities).

This is not a case of a conversion from CVF 6.6 to Intel 11.1 (source code had to change).

The record length indicators are 1 byte long (INTEGER(1)) usually followed by the ASCII character for Space. You tried reading the record size using INTEGER(larger than 1). The ASCII Space character is 0x20 not 0x00. So your read record size will be bunged up.

This also can be the case of:

Using CVF 6.6 someone was in the process of upgrading the internal data format for your datbase (IOW a work in progress). That someone left (IOW left you with a can-o-worms). You took their most recent work (unfinished work), compile it with 11.1 and it doesn't work.

There is one additional potential for this muckity muck.

The database was originaly stored on mag tape (reel tape). A utility was used to copy the tape data to a disk file.
The mag tape format generally has a label (small-ish file name), a sequence number (in event of multi-reel file), maybe other header stuff, followed by records. The utility used to copy this data inserted a byte count into the data stream. mag tape records have a size that is not part of the data stream (records seperated by record gap).

So.... if the CVF program was reading this file from mag tape and your IVF file is reading a copy of this file from disk then the on-disk format is different from the on-tape format.

Does any of this apply to your situation???

Jim Dempsey

ferrad01 · ‎12-29-2010

Steve,

Here are 2 files: a user case INJWAT_DIG.DPT saved under CVF 6.6. Intel 11.1 cannot read this file. We ran our program under Intel to create INJWAT.DPT, which Intel 11.1 can read.

Adrian

ferrad01 · ‎12-29-2010

Jim, thanks for this... I'll have to read through all this tomorrow. In the meantime, I have attached the 2 files to Steve's response.
Adrian

Steven_L_Intel1 · ‎12-29-2010

Please also supply the program that wrote these files. If you want, create a small test program that demonstrates the problem. I can't investigate based on the data files alone.

ferrad01 · ‎12-30-2010

Steve,

I have attached a zip file containing 2 identical fortran files which write 3 values to an unformatted file, one compiled with CVF 6.6, the other with Intel 11.1. The generated unformatted output files are also attached which show the different structures.

test_dig.for compiled with:

c:\progra~1"microsoft visual studio"\df98\bin\dfvars
fl32 test_dig.for

test_intel.for compiled with:

C:\Program Files\Intel\Compiler\11.1\060\Bin\ifortvars.bat" ia32
ifort test_intel.for

Adrian

mecej4 · ‎12-30-2010

What is the reason to use fl32 rather than df to do the first compilation?

If you have the source code that can produce the unformatted files in your applications, just compile them using df, or, for that matter, ifort.

If not, here is one way to construct a conversion utility which does not require knowing the structure of the unformatted file record control markers. Take the sources for the consumer program, i.e., the one that has to read the unformatted files. Strip out all code except those parts that open, read and close the file. Construct a mirror image of these sources, in which each READ statement in the original is replaced by a corresponding WRITE statement. Construct an output subroutine with the WRITE statements, and call the output subroutine from the main program after all the unformatted files have been read. Arrange to have the variables in the I/O statements passed as arguments or in COMMON. Compile the input part of the program using fl32 and compile the output part using df. Link the two parts to obtain the converter.

Here, for illustration, is such a converter for your test data.

[fxfortran]      program Cnvrt

      implicit none
      integer      :: it
      real*8       :: rt
      character(8) :: ct

      open(unit=2, file='test_dig.txt', form='unformatted')
      read(2) it, rt, ct
      close(2)

      call wrsub(it,rt,ct)

      end
[/fxfortran]

Compile this using fl32 (or ifort /fpscomp).

Here is the output part:

[bash]      subroutine wrsub(it,rt,ct)
      implicit none
      integer      :: it
      real*8       :: rt
      character(8) :: ct

      open(unit=3, file='test_dig.cnv', form='unformatted',status='new')
      
      write(3)it,rt,ct
      close(3)
      
      return
      end
[/bash]

Build and run the converter with the commands

[bash]S:> fl32 -c cnvrt.for                            OR     S:> ifort /fpscomp /c cnvrt.for
S:> df cnvrt.obj wrsub.for /Fecnvrt    OR     S:> ifort cnvrt.obj wrsub.for /Fecnvrt
S:> cnvrt
[/bash]

The converted file test_dig.cnv will have been produced.

ferrad01 · ‎12-30-2010

fl32: no idea, this was how the products have been built over the last 15 years (before my time). I'm just reproducing the behavior for Steve so he can investigate the format change.

With regard to your converter, this is feasible for simple data files, however I fear it is a massive task for our data files which can get enormous (500 MB).

I's rather not be in this situation in the first place, so I'd like to see Steve's comments first before I decide what to do next.

Steven_L_Intel1 · ‎12-30-2010

You now answered the problem. Your use of "fl32" is the cause. This is the Microsoft Fortran PowerStation compatibility command that CVF supported, which gives you all the PowerStation compatibility options, including /fpscomp:ioformat. If you had used any other command, such as df, then you would not get this option on by default. Evidently your application was originally built with PowerStation and you continued to use fl32 with CVF, not realizing that it changed CVF's defaults.

You can, of course, continue to use /fpscomp:ioformat, but we did away with the fl32 command.