- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dave
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
We access the file from many different programs so the format needs to stay the same. The file isn't written sequentially (although most of the data we write into is sequential in spots). When we read it we are computing where to read data from in the file as we ususally only wantvery small pieces of it. The size of the file varies as the number of records vary with each file. In the case I have the file grows to about 900 MB, and it has several thousands cases on it, each made up of a lot of records.It takes about 2 minutes to write each case out, so it takes about 12 hours to do all the writingfor this case (the rest of the computations take about 2 days). Reducing this would be very helpful. I know writing it out sequentially takes very little time.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Since we don't write it all sequentially wouldn't that be difficult?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Have you run the program through a performance analyzer such as Intel VTune Amplifier XE to see where the time is being spent? If you're waiting for the OS to complete the write or the position, there's not much you can do on the Fortran side.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
On the plus side we are in the process of switching to a database so maybe that will be an improvement.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Consider writing your updates into a seperate file(s). Then run a sequential merge into your 900MB file.
The merge of 900MB main file and one or more small files could be on the order of 100MB/sec.
Your milage may vary. Can your main file be "off line" for under a minute?
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[SergeyK]Windows OSdoesn't have and doesn't imposeany I/O limitations for applications. It tries
toallocate as much as possible memory for datacaches and then data from these caches
is written to anHDD.
On the plus side we are in the process of switching to a database so maybe that will be an improvement.
[SergeyK] What database are you going to use?
Alow-level APIdeveloped inC/C++to store some data from a Fortran application could be another option.
But, since a database will be used anywayit doesn't make sensetospend time on a such API.
I'm planning some performance tests / evaluations with some C/C++ APInext week and I could provide
withsome numbers forcases with1GB and 2GBdata files ( in binary and txt formats ).
Best regards,
Sergey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>>it takes about 12 hours to do all the writingfor this case
I verified performance of a low level API ( C/C++ ) for writing some data into a data file for Text and Binary formats.
In case of a Text format a content looked like:
ROW1=
ROW2=
...
ROWN=
In case of a Binary format a content looked like:
ROW1=
ROW2=
...
ROWN=
Here are results and numbers are relative:
Text Format Binary Format
Data file size 250MB 1x ~5.0x faster
Data file size 500MB 1x ~4.5x faster
Data file size 1GB 1x ~4.0x faster
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Don''t forget the physical layer. As far as I know, data are written on the disk by cluster. So If you want to write 8x16 bytes, the cluster in which your data has to be written must be read in memory from the disk, then your data is inserted in it and the cluster has to be written back to the disk. For a common cluster size of 4kb, you read and write 4kb each time you believe you are writing only 108b.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for the note. I think nobody would argue regarding this. Here are some additional details. All tests
were writing a content of a matrix of single-precision values ( 'float' data type):
Size of output txt-file: ~250MB - Matrix size: 5250x 5250
Size of output txt-file: ~500MB - Matrix size: 8000x 8000
Size of output txt-file: ~ 1GB - Matrix size: 10750x10750
Note: To calculate a totallength of a rowmultiply a matrix 'm' valueby '9'. For example, 5250 x 9 = 47250 bytes

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page