Which is better performer: random access to small files or arrays?

dadcat · ‎02-24-2012

I have an app that needs to randomly read the contents ofa setfairly smallfiles (small enough to all fit in the Windows system file cache)...

Which would be the better performer:

1) read the files randomly, and rely on the Windows system file cache to keep the files in-memory.

-or-

2) read the file contents into arrays (or other Fortran structures) and then randomly access the data.

It appears that Option 1 (read the files randomly) is a poor performer even when my app is the only process running, and there is plenty of memory (4 coreswith 25% of 8GB available).

According to PerfMon the app never gets beyond 20% CPU utilization (90% of which is kernel time), and there's almost nophysical I/O either, which tells me the files are indeed in the cache. But then why isn't the app "pegging" the processors?

I would expect that accessing a file that's resident in the Windows system file cache would be almost instantaneous, but instead my app spends a great deal of time "waiting" on something... but what???

Is thefile cache manager or something that it depends-upon (eg,cross-process communication) this sluggish?

DadCat

Paul_Curtis · ‎02-26-2012

My vote would be to move the data into arrays and do your work there, should be much faster.

However, the file-based approach can also work well, especially if you abandon the native Fortran i/o and use the Windows API routines, where you can obtain a file's handle and jump around by setting the active location pointer (offset) before reading; this is really fast, works very well:

[bash]rval = SetFilePointer (ihandl, offset, NULL, FILE_BEGIN) IF (ReadFile (ihandl, & ! file handle loc_pointer, & ! address of data nbytes, & ! byte count to read LOC(nact), & ! actual bytes read NULL_OVERLAPPED) == 0) THEN ! deal with access error END IF[/bash]