- Marcar como novo
- Marcador
- Subscrever
- Silenciar
- Subscrever fonte RSS
- Destacar
- Imprimir
- Denunciar conteúdo inapropriado
I have been processing pairs of enormous text files, reading values as reals, manipulating those values, then outputting results to a third enormous text file. The data looks like this, with header info on the first 6 lines, then serious data to follow:
ncols 12095
nrows 17716
xllcorner 114.97464999692
yllcorner -35.064354662821
cellsize 0.000226
NODATA_value -9999
-9999 -9999 -9999 -9999 -9999 -9999 27.532 -26.49 -9999 -9999 10.6 -9999 .... etc for 12095 columns and 17716 rows.
I open the file with the following statement:
open(1,file=grid1,status='old',form='formatted',recordtype='stream_lf',recl=100000,iostat=ios,err=1000)
and after trivial reads of the header information read each record with:
read(1,'(A)',iostat=ios)line !Where line is character(100000)
Then I pick my way along the line looking for the space delimiters (these are not regularly placed) and read the values into an array of reals. The output method is an approximate inverse of the above.
It works, but it's very slow for files of this size. Is there a better way?
With many thanks in advance.
Mike
- Marcas:
- Intel® Fortran Compiler
Link copiado
- Marcar como novo
- Marcador
- Subscrever
- Silenciar
- Subscrever fonte RSS
- Destacar
- Imprimir
- Denunciar conteúdo inapropriado
Import your records into Excel, use a macro to convert text to cells using delected delimiter, then do your sums by calling a Fortran DLL, then export your data to a file with tab or comma delimiters?
- Marcar como novo
- Marcador
- Subscrever
- Silenciar
- Subscrever fonte RSS
- Destacar
- Imprimir
- Denunciar conteúdo inapropriado
Have you considered multi-threading the scan for blanks and conversion from text to real?
Jim
- Marcar como novo
- Marcador
- Subscrever
- Silenciar
- Subscrever fonte RSS
- Destacar
- Imprimir
- Denunciar conteúdo inapropriado
real(4) mynumbers(24)
namelist /myrecord/ mynumbers
open(1,file="datafile.txt",form="formatted",status="unknown")
read(1,NML=myrecord)
will read the following file OK and convert apparent integers to real(4)
&myrecord
mynumbers=
-9999,-9999,-9999,-9999,-9999,-9999,27.532,-26.49,-9999,-9999,10.6,-9999,
-9999,-9999,-9999,-9999,-9999,-9999,27.532,-26.49,-9999,-9999,10.6,-9999
&end
NAMELIST will accept 6*-9999 to represent 6 consecutive -9999 values, so you can save even more storage space if you write your output data in this form ready for NAMELIST input, if that is one of your requirements (however, NAMELIST output will impose a fixed length for each output value, so it will not produce similar compact data).
- Marcar como novo
- Marcador
- Subscrever
- Silenciar
- Subscrever fonte RSS
- Destacar
- Imprimir
- Denunciar conteúdo inapropriado
what genereates a 12095 col output?
why not simply output and at the end of each 'record' have a special character/etc for 'end of record?
Then input does not require the long read/breakdown?
If modifying the input streamis not possible, then i'd suggest looking atmodifying HOW you 'pick along the line looking for space delimiters'.
brian
- Marcar como novo
- Marcador
- Subscrever
- Silenciar
- Subscrever fonte RSS
- Destacar
- Imprimir
- Denunciar conteúdo inapropriado
Thanks for the replies. I should have been a little clearer about the source data - it comes from a third party package (ArcGIS) so I am unable to change it.
I would like to know a little more about Jim Dempsey's suggestion on multi-threading - how could I apply that to this problem?
Many thanks
Mike
- Marcar como novo
- Marcador
- Subscrever
- Silenciar
- Subscrever fonte RSS
- Destacar
- Imprimir
- Denunciar conteúdo inapropriado
Since it's a text file, you could simply use the list-directed feature:
open(1,file=grid1,status='old',form='formatted',recordtype='stream_lf',recl=100000,iostat=ios,err=1000)
...
!read the first six lines of the file here
...
allocate (some_data(ncols, nrows))
read (1, *, iostat = ios) some_data
if (ios /= 0) ...
Try it, if only to see how fast it is compared to what you're using now.
- Marcar como novo
- Marcador
- Subscrever
- Silenciar
- Subscrever fonte RSS
- Destacar
- Imprimir
- Denunciar conteúdo inapropriado
[bash]Simple method (first attempt) type myType integer :: iCellCount real, allocatable :: cellData(:) end type myType type(myType), allocatable :: threadData(:) iRow = 0 iMaxThreads = omp_get_mas_threads() allocate(threadData(iMaxThreads)) ! add error test iMaxThreadCells = (ncols / iMaxThreads) * 2 ! larger than worst case/thread do i=1,iMaxThreads allocate(threadData(i)%cellData(iMaxThreadCells) ! error test end do ! file read loop while(.true.) read(1,'(A)',iostat=ios)line !Where line is character(100000) if(ios) exit iRow = iRow + 1 if(iRow > nrows) PrintErrorAndAbort() iLastChar = useFastWayToFindLastCharOfLine(line) ! LENTRIM(line)? !$omp parallel private(iThread) iThread = omp_get_thread_num() + 1 ! use 1-based thread number threadData(iThread)%iCellCount = 0 if(iThread == 1) then ! special case for first number on line threadData(iThread)%iCellCount = 1 read(threadData(iThread)%cellData(1), '(F)') line endif !$omp do do i=1,iLastChar if(line(i) == ' ') then threadData(iThread)%iCellCount = threadData(iThread)%iCellCount + 1 if(threadData(iThread)%iCellCount > iMaxThreadCells) PrintErrorAndAbort() read(threadData(iThread)%cellData(threadData(iThread)%iCellCount), '(F )') line(i+1:) endif end do ! end of parallel do iFill = 1 do i=,iThread-1 iFill = iFill + threadData(i)%iCellCount end do do i=1,threadData(iThread)%iCellCount bigArray(iFill, iRow) = threadData(iThread)%cellData(i) end do !$omp end parallel end do [/bash]
Jim
- Marcar como novo
- Marcador
- Subscrever
- Silenciar
- Subscrever fonte RSS
- Destacar
- Imprimir
- Denunciar conteúdo inapropriado
I have come across a problem immediately though, I've got ...
use omp_lib
.
.
i = omp_get_max_threads()
This compiles but won't link - "unresolved external symbol". What do I need to do?
Many thanks
Mike
- Marcar como novo
- Marcador
- Subscrever
- Silenciar
- Subscrever fonte RSS
- Destacar
- Imprimir
- Denunciar conteúdo inapropriado
In VS Solution pane
Right-Click on your project
| Properties
| Fortran
| Language
| Process OpenMP directives
|
Or from command line add /Qopenmp
Then on link line add the appropriate OpenMP library
VS automatically adjust the link line
The sketch code has typographical errors. The intent is to provide you with an overview of a simple parallel process.
After you get this working, you can decide if you want to spend additional time on improving this section of your code. Efforts on parallization of other code in your application might be abetter choice. The code sketch I provided does not overlap the reading of the line with the conversion from text to REAL. An improvement can be attained with overlapping of I/O with conversion.
Jim

- Subscrever fonte RSS
- Marcar tópico como novo
- Marcar tópico como lido
- Flutuar este Tópico para o utilizador atual
- Marcador
- Subscrever
- Página amigável para impressora