Solved: Re: Index function return 0 for big values.

roy437 · ‎06-26-2022

Hi,

I have a program that turns a very large stl file (4.11GB) into a pts file "stl_to_pts.f90". An example stl file:

solid ASCII STL file generated with VxScan by Creaform.
facet normal 0.284589 0.628278 -0.724069
outer loop
vertex -1166.990479 -277.309601 656.863098
vertex -1166.887573 -276.861084 657.292725
vertex -1166.407104 -277.144348 657.235779
endloop
endfacet
facet normal 0.287695 0.632549 -0.719106
outer loop
vertex -1166.887573 -276.861084 657.292725
vertex -1166.523071 -276.255005 657.971680
vertex -1166.407104 -277.144348 657.235779
endloop
endfacet
facet normal 0.408622 0.642011 -0.648730
outer loop
vertex -1166.527466 -276.258850 657.991821
vertex -1166.468018 -276.235260 658.052612
vertex -1166.257812 -276.307129 658.113892
endloop
endfacet
facet normal 0.082224 0.975477 0.204167
outer loop
vertex -1166.523071 -276.255005 657.971680
vertex -1166.527466 -276.258850 657.991821
vertex -1166.257812 -276.307129 658.113892
endloop
endfacet
facet normal 0.466552 0.599195 -0.650611
outer loop
vertex -1166.523071 -276.255005 657.971680
vertex -1166.257812 -276.307129 658.113892
vertex -1166.407104 -277.144348 657.235779
endloop
endfacet
facet normal 0.302361 0.775017 -0.554911
outer loop
vertex -1166.468018 -276.235260 658.052612
vertex -1166.377808 -276.043121 658.370117
vertex -1166.155640 -276.129272 658.370850
endloop
endfacet
facet normal 0.409470 0.665844 -0.623687
outer loop
vertex -1166.468018 -276.235260 658.052612
vertex -1166.155640 -276.129272 658.370850
vertex -1166.257812 -276.307129 658.113892
endloop
endfacet
facet normal 0.316780 0.659006 -0.682174
outer loop
vertex -1166.407104 -277.144348 657.235779
vertex -1166.257812 -276.307129 658.113892
vertex -1165.399780 -277.246002 657.605347
endloop
endfacet
endsolid

Source code:

    module read_fis
       character *260 fisi
       character(:), allocatable :: str
    end module read_fis

    subroutine read_file_to_string
    use ifport
    use read_fis
    implicit none
    integer *4 :: iunit, istat
    integer *8 :: file_size
    character(len=1) :: c
    open(newunit = iunit, file = fisi, status = 'OLD', &
            form = 'UNFORMATTED', access = 'STREAM', iostat = istat)
    if (istat==0) then
        inquire(file=fisi, size=file_size )
        if (file_size > 0) then
            allocate(character(len=file_size ) :: str)
            read(iunit, pos=1, iostat = istat) str
            if (istat==0) then
                !make sure it was all read by trying to read more:
                read(iunit, pos = file_size+1, iostat = istat) c
                if (.not. IS_IOSTAT_END(istat)) then
                    write(*,*) 'Error: file was not completely read: ', trim(fisi)
                end if
            else
                write(*,*) 'Error reading file: ', trim(fisi)
            end if
            close(iunit, iostat=istat)
        else
            write(*,*) 'Error getting file size: ',  trim(fisi)
        end if
    else
        write(*,*) 'Error opening file: ', trim(fisi)
    end if
    return
    end

!   Main program
!===================================================================================

    use read_fis

    implicit none
    character(:), allocatable :: a
    character *260 fise
    character  *10 ext
    character   *1 cr
    character   *2 crlf

    integer *8 i, j, m, i1, i2, k
    integer *4 np, kp
    integer *4 :: iunit, istat

    cr = char(13)
    crlf = cr//char(10)

    call GETARG(1, fisi)
    call GETARG(2, fise)

    kp = index(fise, '.', back = .true.) + 1
    ext = fise(kp:)

    call read_file_to_string ! str = fisi
    m = len(str) ! /2
    allocate(character(len=m) :: a)  !   a = fise

    open(newunit = iunit, file = fise, status = 'NEW', &
            form = 'UNFORMATTED', access = 'STREAM', iostat = istat)
    if(istat == 0) then
        continue
    else
        write(*,*) 'Error opening file: ', trim(fise)
        stop
    end if

    i = 1
    i1 = 1

    if(trim(ext) == 'pts' .or. trim(ext) == 'PTS') then
      np = 0
      do
         k = index(str(i:), 'vertex', kind = 8)
         if(k > 0) then
         	 np = np + 1
             i = i + k + int8(6)
             j = i + index(str(i:), cr, kind =  - int8(2)
             i2 = i1 + j - i + int8(4)
             a(i1:i2) = str(i:j)//' 9'//crlf
             i1 = i2 + int8(1)
         else
             write(99,'(i0,t11,i0)') k, i
             write(iunit, pos = 1, iostat = istat) a(:i2-2)
             exit
         end if
      end do
    else
      do
         k = index(str(i:), 'vertex')
         if(k > 0) then
             i = i + k + 6
             j = i + index(str(i:), cr)
             i2 = i1 + j - i
             a(i1:i2) = str(i:j)
             i1 = i2 + 1
         else
             write(iunit, pos = 1, iostat = istat) a(:i2-2)
             exit
         end if
      end do
    end if

    deallocate(str)
    close(iunit)
    deallocate(a)

    stop
    end

I checked the "str" variable and it stores the entire stl file.

The program stops when k = 0 when it no longer finds the word "vertex". The last index found by the program is i = 121,513,698 although the last index in the stl file is 4,416,480,763. The question is: why does the index function return 0 after i = 121,513,698.

In "Project properties" it is set to "Default Integer kind": 8 (/ integer-size: 64), and in "Configuration Manager": x64, Release

I mention that I have:
I7-3770 processor, 16GB RAM, WINDOWS 10 Home / 64bit,
Intel Parallel Studio XE 2020 Update 4 Windows x64.

Thanks.

roy437 · ‎07-24-2022

Hooray,

I solved the problem, removed kind=8 and replaced index(str(i:) with index(str(i:i200) where i200=200. Execution time 11.70 sec. Thanks to everyone for the advice.

I7-3770, 16GB RAM, W10/64bit and ssd.

J. Roy

View solution in original post

Barbara_P_Intel · ‎07-05-2022

Can you please try the current release of ifort, 2021.6.0? It's part of the oneAPI HPC Toolkit 2022.2. Download the toolkit here.

Or use this tip that Ron wrote for just installing the Fortran compiler.

roy437 · ‎07-24-2022

Hooray,

I solved the problem, removed kind=8 and replaced index(str(i:) with index(str(i:i200) where i200=200. Execution time 11.70 sec. Thanks to everyone for the advice.

I7-3770, 16GB RAM, W10/64bit and ssd.

J. Roy

John_Campbell · ‎07-14-2022

I am not sure if this problem has already been resolved, but a few suggestions to check.

It might help if your use of INDEX was consistent with use of ", kind=8" when refering to "str"

It looks like there could be a problem with character string str larger than x gbytes, although this is not conclusive without a sample file to test ?

Check line 87, although this may simply be just coruption of "8)" as smiley face !!

You could duplicate the use of INDEX with your own function find_in_string, to identify where/if INDEX is failing, eg

    integer*8 function find_in_string ( string, substring )
      character *(*) string, substring
      integer*8 :: len_str, len_sub, k, j

      len_str = len(string)
      len_sub = len(substring)
      do k = 0,len_str-len_sub
        do j = 1,len_sub
          if ( string(k+j:k+j) /= substring(j:j) ) exit
        end do
        if ( j <= len_sub ) cycle
        find_in_string = k+1      ! found starting at k+1
        return
      end do
      find_in_string = 0          ! not found

    end function find_in_string

It could also be useful to test the value of len_str = len(string), by providing this value as an input to the function for checking.

I hope these ideas might help.

John_Campbell · ‎07-15-2022

I am not sure if this problem has already been resolved, but a few suggestions to check.

It might help if your use of INDEX was consistent with use of ", kind=8" when refering to "str"

It looks like there could be a problem with character string "str" larger than x gbytes, although this is not conclusive without a sample file to test ?

Check line 87, although this may simply be just coruption of "8)" as smiley face !!

You could duplicate the use of INDEX with your own function find_in_string, to identify where/if INDEX is failing
It could also be useful to test the value of len_str = len(string), by providing this value as an input to the function for checking, as shown in this untested example code.

    integer*8 function find_in_string ( string, substring, len_string )
      character *(*) :: string, substring
      integer*8 :: len_string
      integer*8 :: len_str, len_sub, k, j

      len_str = len(string)
      if ( len_str /= len_string ) then
        write (*,*) 'inconsistent string lengths for string',len_string,len_str
        find_in_string = -1
        return
      end if

      len_sub = len(substring)
      do k = 0,len_str-len_sub
        do j = 1,len_sub
          if ( string(k+j:k+j) /= substring(j:j) ) exit
        end do
        if ( j <= len_sub ) cycle
        find_in_string = k+1      ! found starting at k+1
        return
      end do
      find_in_string = 0          ! not found

    end function find_in_string

You could furter modify this code for the source of the error to be identified.

roy437 · ‎07-24-2022

The correct line 87 is: j = i + index(str(i:), cr, kind=8) - int8(2)

For test there is file : https://we.tl/t-PGVsUr5zzZ

Thx,

J. Roy

JohnNichols · ‎07-24-2022

He found the error, but it would have been a lot easier to read a line, do your stuff and throw it away. Why you want to load a 4.4 billion character file into memory is beyond me?

But we all do things differently.

GVautier · ‎07-25-2022

Hi

You change two things (remove kind=8 and add I200). Does just one change correct the problem or not? It would ease the job of developers to have the answer to that question.

To JohnNichols, I think it's also a matter of generation. We began programming in a time were resources (disk, memory, CPU...) were limited and it has influenced our approach of solving problems.

roy437 · ‎07-25-2022

Yes, just one change correct the problem.

But rectify i200=200 to i200=i+int8(200)

Perhaps others of the new generation would have a different approach. If there is I am happy to take advice.

JohnNichols · ‎07-25-2022

You appear to know what you are doing. But this is a good site to just ghost.

JohnNichols · ‎07-25-2022

Here I would disagree, my apologies for disagreeing, but the large str file has a good chance of having errors in it. You print out that much data from another program, you are now relying on the programmer creating entirely correct code correct. I have a problem with Rhino in moving structures into Finite Element Programs. Rhino has a nasty habit as the file gets very large of creating two plane shapes on top of each other. These are very hard to find and really hard to fix. The structural analysis package, Strand 7, has a heart attack, it cannot cope with this configuration. Narbonne Cathedral model in the South of France is my Waterloo. [Read Jenny Coglan's article in today's Guardian to see a similar example, the article is very funny and she is a good writer.].

https://www.theguardian.com/commentisfree/2022/jul/25/why-i-quit-gaelic-language-forefathers-vocabulary

It was easier to write an IFORT program to generate the mesh than to use Rhino, I tried very hard to get it to work, but it is quicker and easier in IFORT.

So the problem is to read one file and translate it to another. I still think that you do it one line at a time. Sooner or later you will run into a file you cannot load into memory. It has a finite size when compared to an SDD.

If you look at the STR file, it has a lot of repeated information that is of absolutely no use to a machine, it is to make the file human readable, if that file is that big, even with UltraEdit you will not be able to look at it in a reasonable time. If VEDIT crashes on the file, it is big.

Fix the STR file so it is easily machine-readable, ie no blasted end of lines and use numeric codes instead of words for the descriptors or go the full something and be like a DXF file, one code, one number or string.

The file was so big, VS 2019 would not show all of STR, the last character it showed as ', which is not in the file.

The beauty of Fortran, we are all allowed our opinion. I understand your viewpoint, but the scars on my back from handling huge files are too deep to try it.

Final comment, I was randomly looking on the web, late at night, and I found some World Bank fiscal models. One of the comments in the article was on the number of languages the WB used for their models, R, EXCEL, Python, etc, at least 20 in the list. The end of the list reminded everyone of the halcyon Fortran days, but Fortran was not on the list.

You have that much money to spend, the models are important and you do them in EXCEL. You cannot trust an EXCEL model, they are to easy to "adjust" and the "adjustment" is to hard to find and easy to apologise for. One reason the good MBA schools use Quantrix.

You have that much money, then force everyone to work on a common set of models in a language where the code is readable, C or Fortran.

Someone just sent me a file to analyse, it took 30 minutes to download on high-speed internet, it is a CSV file and EXCEL said to go to some version of programming hell when I tried to open it, just to see if I could.

It is 104 degrees in Texas.

John_Campbell · ‎07-26-2022

Hi John,

It is difficult to work at 104 degrees !

I do share some of your views of using packages like "Rhino" when developing large FEA models, especially when the model grows.

My main model generator is also Fortran.

An advantage of this approach is you can define rules in the Fortran code for generating the data ( typically nodes and elements ).

You can run the code and then check the data, either visually as a model or by more Fortran code to process the model definition.

If you find an error, you can correct the errors and then re-run the Fortran code without these identified errors.

With FEA modeling using IDE's, the model is continually being modified and growing. It can be hard to restart the model generation process to eliminate errors in the generation process.

As models grow, it can be much easier to write code to systematically check for errors, such as duplicated nodes, planes or volumes, especially if the alternative is using visual methods.

I find when interfacing different components or setting out components, the precision of using Fortran to define the geometry can be helpful. My modelling includes train lines on girders, so the precision of rail set-outs, connected to slabs, via fasteners then through to supporting girders, say on a transition curve, this systematic geometric precision can be more easily defined and revised via Fortran code.

Finding an error in a 4.4 billion character file would be "extremely" difficult !!

I am sure others may disagree, but the Fortran approach provides me with a reproduceable audit path that leads to greater confidence in the final model.