PyDPF-Post - will this program not do what you want.

Umut_Tabak · ‎08-22-2023

Dear all,

To start with, this will be a long and detailed post unfortunately.

I have some research code which I used to extract information from the output binary files of the commercial FE code ANSYS. Apparently, ANSYS is compiled with Intel compiler. Below is the related information from the log file that I took from one of log files of ANSYS:

Compiler: Intel(R) Fortran Compiler Version 19.0.5 (Build: 20190815)
Intel(R) C/C++ Compiler Version 19.0.5 (Build: 20190815)
Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191125
BLAS Library supplied by Intel(R) MKL

I was using my research code(written in MATLAB) until ANSYS 18 without issues. For later versions, my code can not read the binary files written by ANSYS( for instance, one example is the latest ANSYS version which we use at the university that was compiled with the above Fortran compiler).

I know that normal Fortran unformatted files have record lengths before and after the data. And the files written by ANSYS follow this scheme as long as integer records are concerned. If I would like to read an integer record from the binary files, there is no problem. By following the below mentioned record structure, I can read the data correctly:

1. first, the record length information is there, call this recLen

2. and then there is a dummy integer

3. third, there is an integer array of length recLen, which is the data I would like to read.

4. last is the dummy integer.

Interesting thing is that there are also occasions, where the above structure changes for integer records also slightly. For instance, if I would like to read some sparse matrix row/col indices from the related binary output files where the matrices are stored, I have to change the above integer record structure to the first 3 steps so that the last dummy integer read is skipped. The format of the binary file for the sparse matrix is given in the attached screenshot, intel_0.png(directly copying from the programmers reference manual). i in that screenshot corresponds to the integer records. 1 shows that there is one record. 'varies' shows that the number of items in the records varies as expected. dp/cmp refers to double precision records. So the data is organized in usual record format with certain exceptions that I can not understand. One is mentioned above which is to read the integer records. I am just assuming that these records are written with the write command in Fortran with the unformatted option.

As an example, to read the first row of a matrix(sparse) out of this binary file, there is the row index information of the sparse matrix.

A more interesting manipulation I have to perform is to read the following double precision data, sparse matrix double data. For this one, I have to use the below code in MATLAB:

recLenVals = fread(fid,1,format1,0,'l');

dum = fread(fid,1,format2,0,'l');

colVals = fread(fid, recLenVals, format2,0,'l');

dum = fread(fid,1,format1,0,'l');

The format is still the same as above for this binary file and for this record but I have to read the first dummy as double, 8 bytes.(format1 = 'int', format2 = 'double' in MATLAB). If I use the above code and logic. I can read the matrix entries with success and I can assemble matrices correctly to use that in my own research. It is very strange that I have to perform this manipulation on this record for the first dummy.

How can I solve this record structure challenge and understand it completely?

I am following the below link to understand the record structure a bit more,

https://www.intel.com/content/www/us/en/docs/fortran-compiler/developer-guide-reference/2023-1/record-types.html

and I am wondering if there is a way to just find out which of these formats might have been used during a write operation in Fortran by investigating the binary output files?

Any pointers and ideas are appreciated deeply.

BR,

Umut

andrew_4619 · ‎08-22-2023

An unformatted sequential file has data at start and end of each records which is compiler specific but should gives a record length. Beyond that unless you have an ANSYS technical document that describes the file content you are into reverse engineering and guess work as to the record contents and meaning as it could be any mix of data types. If it was written with stream format there are no record markers at all.

Umut_Tabak · ‎08-22-2023

Hi Andrew,

Thank you for your message.

Indeed, I am digging into documentation in the mean time. The only information that I could find from the ANSYS programmer's reference manual is that the binary file is a direct-access, unformatted file so it is not a file with stream format.

And in general the data is explained in the format that I attached in the previous screenshot, only the general structure is mentioned but the other details are missing, like how the data is written and what kind of record overhead is there. What I was after is to find out this information since the compiler and version is specific.

andrew_4619 · ‎08-22-2023

Direct access have records all of the same length which may include padding if required

Umut_Tabak · ‎08-22-2023

well, in my understanding, direct access is used not for fixed record lengths( because there are definitely of records of different length) but more for records that you can access in a relative manner by using some pointers inside the file, am I mistaken with this interpretation?

jimdempseyatthecove · ‎08-22-2023

>>I was using my research code(written in MATLAB) until ANSYS 18 without issues. For later versions, my code can not read the binary files written by ANSYS

This would indicate that the ANSYS data file format has changed. @Umut_Tabak you will have to locate the ANSYS 18 reference manual as @andrew_4619 stated, note differences between what you were using and what you now have, and make adaptations to your program.

I would suggest you modify the low level read functions (e.g. BINRD8) to interface to the file using stream I/O, or to a buffer filled using stream I/O. This will require you know the internal format of the data within the file.

Note, In your .jpg image, the number of row indices was to be read using BINRD8. I assume this is returns an INTEGER(1) value. This restricts the number of row indices to no more than 127 (unless you adjust to make this unsigned and in which case, no more than 255 rows). It may be that ANSYS 18 now permits more than 127 (or 255) rows. In this case, I would guess that BINRD16 (or other number bitness as described in the ANSYS 18 reference) would be used. This would shift the input stream by the additional byte(s).

Jim Dempsey

Umut_Tabak · ‎08-28-2023

Dear Jim,

Thank you for your kind reply. Indeed I also developed a C/C++ interface for this problem where I was using the binrd8 routines and calling them from C++ and also using the libbin.so.

But, since debugging in MATLAB is easier, I prefered to use this interface.

I was a bit side tracked and could not look into this issue. Yesterday, I had a bit of more time and once more carefully looked at this problem. I want to give you an update on this issue.

I created a very simple test problem in ANSYS, a toy problem. A one element shell model with 4 nodes in total. This results in a 24 by 24 matrix. And I am trying to read the matrix row entries with my interface in MATLAB but I realized an even more interesting problem now. I can read the matrix row/col indices and values in the matrix correctly with the format that I use up to row 22, but at row 22, the same code can not read the record length information correctly.

Before that row, I can read the data correctly and if the format I used up to that point was not correct, I would not be able to read the row indices and row values up to row 22 as well( just to be completely sure, I printed the matrix in matrix market format and carefully checked all the indices and values, as mentioned up to row 22, they are correct.). I am completely baffled by this problem.

Even more interesting is the following: if I increase the element count to 2 so I will have 6 nodes over 2 elements and then I have to read a 36x36 resulting matrix. With the format I used above up to row 22, I can read this matrix correctly without any errors.

How can the file format change between two binary output files of same program?

It might also be better to contact ANSYS forums on this problem once more.

Best regards,

Umut

andrew_4619 · ‎08-28-2023

Without seeing examples files and further tech info the answer to your question is blowing in the wind. As a wild quess 21x24=504 which is less than 512 and 22x24 > 512. Record length choices that are powers of 2 would often be a logical choice.....

Umut_Tabak · ‎08-28-2023

Dear Andrew,

Thank you for your message but records are of varying size. I am sure of this. So you are still considering records of fixed lengths as you mentioned in your first message above. But I guess this does not apply to these problems. In any case, thank you for the message anyway.

BR,

Umut

andrew_4619 · ‎08-28-2023

I have no idea about if the records are fixed or variable length I have no data to look at. My earlier comment was the direct access files have a constant record length as that is the only way you can compute an offset to directly access a specific record. With a variable record length you have to read all the preceding records to read a specific record, hence the term 'Sequential access'.

jimdempseyatthecove · ‎08-28-2023

This may be a case where the logica record produced on the data file creation system exceeded the (maximum) physical record length (on the OPEN statement). In this case, two physical records would have been written for your one logical record. IOW the logical record was split.

To compound this (IMHO), the system reading the file, possibly with different file format settings, is not accounting for the split records.

I suggest you use a binary/Hex editor or some such tool to decipher the data in the file. Then correct your read routines. As to if this can be fixed by modifying an OPEN statement .OR. you OPEN using stream format (and parsing out the record headers), this will be up to you to figure this out.

Jim Dempsey

jimdempseyatthecove · ‎08-28-2023

>> My earlier comment was the direct access files have a constant record length

And if your direct access record length is shorter than your logical recorded length...

(document says an error should occur, who knows???)

.OR.

If the record length on the OPEN to read the file is different than the record length on the OPEN to write the file.

Can you post the data file (if it is manageable)?

Jim Dempsey

JohnNichols · ‎08-28-2023

PyDPF-Post - will this program not do what you want.

Umut_Tabak · ‎08-31-2023

Dear John,

Yes, I am aware of this interface and definitely will give a try and investigate the legacy MAPDL reader there, that is one of the todos for me.

I wanted to update my current software library if possible, first.

Thank you for the pointer anyway

Umut_Tabak · ‎08-31-2023

Dear Jim,

Thank you for the return. I attached two files. See the _1el and _2el files in the attachment.

As an example, in the _1el file, I have to point to 1128 bytes from the start of the file to get to the record structure posted in my very first message. Then I read the data a follows, in the given order, with the given structure:

1. recLen, 1 integer, 4 bytes,

2. dummy, 1 integer, 4 bytes,

3. rowIndices, read recLen(given above) integers from the file(another additional point is that why there is no dummy trailer I could not understand as well but that is not the most pressing problem for now.)

4. recLenV, record length for the values in the row or col.

5. dummy, 1 integer

6. colVals, recLenV(integer, given above) double values, since Fortran is col-oriented, we read the cols but since the matrix is symmetric, it does not matter.

7. dummy, 1 integer

Then go back to step 1 to read the other record. Process goes on smoothly until row/col 22. At row/col 22, I get a recLen as a negative high-valued integer and this is where the reading fails on _1el file. This was what I mentioned in my latest technical post.

On _2el file, the data starts at point 1136 bytes from the start of the file and using the above 7 steps gives the correct matrices, no problems at all.

That is what I wanted to provide as a last step. I will turn to the Python interface from now on.

Thanks for the check in advance.

Best regards,

Umut

P.S. I could not attach the files but I shared a download link as follows : https://www.filemail.com/d/kbaqhxfikcwspyj

Edit: Extra information, below is the format of the data files from the ANSYS documentation:

*comdeck,fddesc jas

c The purpose of this file is to explain how ANSYS binary files are organized.
c ANSYS has internal routines that read and write records of information to
c files. Each record has 2 integer words added to the front and 1 integer word
c added to the back. The first word is an integer count that tells how long the
c record is in 4 byte integer words. The second word is the high order part
c of the first word, with the bit 31 set if the record contains integers, clear
c if the record contains double precision words.
c The last word is a repeat of the first 4 byte integer word
c on the front of the record. Most of the time, the programmer
c does not need to be concerned about the extra three words added to the
c records. see biocom.inc for a description of the read and write routines.

c All ANSYS files are written in standard IEEE format, with the integers being
c 4 bytes long and the double precision being 8 bytes long. This allows
c the ANSYS binary files to be moved freely amoung different types of
c computers. The system dependent conversion (if any) is done as the
c files are being written and read.

c
c The first record on all ANSYS binary files is a standard header record that
c stores 100 integers. The details of what information is stored in this header
c record are given in routine BINHED8.

c As a variety of information gets written to the file, it becomes necessary to
c be able to "jump" through the file to a particular location of interest (e.g.
c to the record that stores the node list). To facilitate this functionality,
c most ANSYS binary files include a second header record at record number two.
c This header record is specific to each file, and usually contains scalar
c metrics necessary to retrieve the other information stored on the file (e.g.
c the number of nodes in the model might be stored in this second header,
c information that is necessary in order to retrieve the node list). In
c addition to these metrics, the second file header will typically store
c pointers to certain records that are known to be of interest. This allows the
c program to read these pointers, then advance the file pointer to that
c specified location to retrieve the data.

c The file description comdecks (fd____.inc) document the contents of these
c second headers, along with the contents of each record stored in the file.

jimdempseyatthecove · ‎08-31-2023

Are you using the ANSYS provided software to access the data, or did you write your own code?

1) If you are using the ANSYS provided software, then this would indicate an issue with the Fortran I/O.

2) If you are using your own code, then this would indicate an issue with your code.

A potential area to be aware of is making a false assumption that all records in the database are the same size (2+100+1) as well as all records are of the same type (integer(4) as indicated by sign bit in word 2 of the header).

The structure of the file is binary, variable record size, with record 1 fixed (a standard header record that
c stores 100 integers), and record 2 (optionally) information to perform direct access to start of information of interest. ( I think these are in units of 4 byte integers).

Jim Dempsey

Umut_Tabak · ‎08-31-2023

Dear Jim,

To reply to your question: I am using my own code and you are right in theory. However, there is sth fishy on this point, namely, if the code would not read the matrix at all, I would agree with you. But, that is not the case. As mentioned above, I can read up to a point and then it explodes if the format is not right how can I read up to that point?

I am more on this point. It is a case for ANSYS support from now on.

Not all records are of the same size. For instance, after the 100 integer header block, there is another 160 integer(which mainly keeps the pointers to read data blocks) block so the sizes vary.

Thanks for all your comments and effort to help me anyway.

Kind regards,

Umut

jimdempseyatthecove · ‎09-01-2023

Before you give up on your code, look at @andrew_4619 's code to see how he opens the file and reads the records.

You can adapt this code to suit your needs.

*** Take into consideration that his code set aside a fixed size buffer. Your adapted code will need to

a) use an allocatable buffer, perform and INQUIRE on file to get file size, allocate appropriate sized buffer, read in whole file, then

peal records out of buffer

b) Specify a large buffer, known to be of size greater than largest record, read in 2+100+1+2 values. This is the header record's header info, data, trailer PLUS the header of the following record. All subsiquent reads will use the previously read two values of the next record to get the count (of integer/raw) data, to read in count+1+2 values. IOW each record read (of integers), reads in the next record's data, plus it's trailer, plus the next record's count and type.

c) Specify a large buffer, known to be of size greater than largest record, read buffer full, note, end of file may be inside buffer, or end of buffer may contain partial record. Your read record routine must account for partial record (either copy partial record out of buffer, or, copy partial record, including header, to beginning of buffer, then read remaining (part of) file into buffer *** offset by partial record)

Jim Dempsey

andrew_4619 · ‎09-01-2023

Out of interest I wrote a micky mouse program to interrogate the 2 binary files you posted. I was working in units of 4 byte words. The output is word position in file, record length at start of record, record length at end of record, next record length.

!  ansysfiletest.f90 
!
program ansysfiletest
    implicit none
    integer :: iun1, istat, ipos, ipos2, ilen
    integer :: iv(100000)
    logical :: lext
    character(1)   :: GCH
    character(128) :: giomes, gfile
    gfile = 'C:\Users\cadfi\OneDrive\Desktop\ansysfiles\shell_2el.rst'
    inquire( file=gfile, exist = lext )
    if ( .not.lext ) goto 999
    open(newunit=iun1, file=gfile, action='read', form='unformatted', access = 'stream', status='old', iostat = istat )
    if ( istat /= 0 ) goto 999
    read(iun1, iostat = istat, iomsg=giomes) iv
    inquire( iun1, pos = ipos2 )  ! get number of bytes read file probs padded to 64k thunks
    close(iun1)
    ipos = 1
    do
        ilen = iv(ipos)
        if(ilen < 1 ) exit
        print *, ipos,ilen, iv(ipos+ilen+2),iv(ipos+ilen+3)
        ipos = ipos + ilen + 3
        if( ipos*4 >= ipos2 ) exit
    enddo
    999 continue
end program ansysfiletest

The screen grab below shows the side by side comparison of the record structure for the two files. The two things I will note is that the records are clearly variable length and the record structure of the two files is subtly different. If you read the ANSYS documents ( have an old ANSYS install now unlicensed and not working). There is a companion DLL with each version and an API (and API manual) for making programs to interrogate output files. The two header records have index information for interpreting the file which the API will understand. If you did manage to reverse engineer the file that is hard work and can break at every new version or indeed different code paths in ANSYS in the same version might decide to write one way or another way based on 'efficiency' or other criteria. Using the API is the way to go or using a third party tool that uses the API.

Binary file read MATLAB, files written by a program(ANSYS) compiled with Intel Fortran Compiler

PyDPF-Post - will this program not do what you want.