Parallelising the reading of multiple files

Chris_G_2 · ‎12-20-2018

I am looking for advice please about using OpenMP (or any other parallelisation method) in Intel Fortran to read a large set of binary files.

I am working on a project where I typically need to read data from several 100 files and store it in a 2-dimensional array. The files are binary and typically contain 5000 (x, y) data points. Reading this data consumes 80% of the clock time when running the associated software.

My question is this: is there any value in parallelising the loop that reads these files?

TimP · ‎12-20-2018

There should be some value in parallelising file read to the extent of keeping multiple CPUs busy. This probably requires a parallel file system to be useful. Normally, parallel FS is a cluster computing facility, used only to the extent of each process accessing a separate file, not within an OpenMP process,

Steve_Lionel · ‎12-20-2018

I am going to guess that CPU cycles are not the big factor here and that parallelizing may not help. Consider instead using the Fortran support for asynchronous I/O. You can start several unformatted reads in parallel and wait for them to finish. It's also possible you are throttled by disk I/O rates, in which case any form of parallelization is likely to make things worse.

Look also at setting a larger BLOCKSIZE when you open the file - this may help a lot.

Chris_G_2 · ‎12-20-2018

Thanks, this all looks very useful.

jimdempseyatthecove · ‎12-21-2018

If the input files are located on a single rotating hard disk, each file can be read in one Seek, Rotational Latency, read of 20KB or 40KB (about 1 rotation) depending on real(4) or real(8). This assumes size(array) == 5000 and not array dimensioned as (5000,5000). For 7200 RPM HD (~7ms for seek + ~5ms rotational latency + ~9ms read) * #files. This assumes the files are contiguous and nothing else on the system is thrashing the disk, and your I/O buffer is large enough to perform the entire read.

If you were to read these files piecemeal (small buffer, and concurrently), say with 1KB buffer

(20KB / 1KB) * (~7ms for seek + ~5ms rotational latency + ~9ms read) * #files (or 40KB / 1KB for real(8))

If your files are located on a solid state disk, then the seek and rotational latencies are eliminated and the read time may be reduced depending on the transfer rate of the SSD.

Try the BLOCKSIZE setting as Steve suggests first.

If not satisfactory, then with large block size, try parallel with 2, then 3, then 4, ... threads until you find the best performing number of threads. Note, for rotating HD, depending on where the files are located on the HD, there may be a preferential reading order.

Jim Dempsey

JohnNichols · ‎12-26-2018

1. Go to Best Buy

2. Ask for a Evo SSD

3. Spend the money

4. Install on computer with Samsung HD copy

5. Will make it faster

6. Unless you already have an SSD - in which case are you reflecting any of the data to the screen, you have 7 million points by the sound of it , but they should read quickly- 10 million integers from an SSD is read in 7 seconds, but write is 44 seconds - numbers average value is 5 million

Chris_G_2 · ‎12-27-2018

The software I am developing here will end up as a commercial product run on computers ranging fom state-of-the-art workstations to creaky XP machines in a cement factory, so I have to find a general solution to the issues.

Thank you all for yuor comments and have a good 2019.

Steve_Lionel · ‎12-27-2018

Be aware that having your application deployed on XP requires additional steps on your part. See https://software.intel.com/en-us/articles/linking-applications-using-visual-studio-2012-to-run-on-windows-xp

Chris_G_2 · ‎01-06-2019

Thanks Steve. At the moment I am ignoring XP and hoping that no one will need an XP version!

John_Campbell · ‎01-08-2019

You say that the files are binary and contain 5000 (x, y) data points.
If we assume that each data point uses about 16 bytes (2 x 8 bytes), 5000 data points could be 5,000 x 16 bytes = 80 kilobytes of info.
If there are 100 files, this would imply about 8.0 Megabytes of info.
This would take about 0.2 seconds to read plus the time to open 100 files, assuming they are stored on a local drive.

This is a very small amount of information, so it is surprising that a parallel solution would be required.

A better solution could be to merge the 100 files into 1 file so that the file opening process could be reduced, or move the files to a local faster drive if the remote drive is the problem.

Another possible issue could be how often the files are opened and read. Are you repeating the reading process?
If you initially read all the files and store this information in memory (or a single file), then process the point information from the in-memory data set, this could improve the performance.

There are other possible time delays, (such as "read (lu,*) ((array(j,i),i=1,2),j=1,num_points)" rather than "read (lu,*) ((array(i,j),i=1,2),j=1,num_points)" ) but essentially, the delays for reading the files does not appear to be the main reason for poor performance.

The slow performance problem as described does not appear to be due to file reading so you need to look at other possibilities.

Chris_G_2 · ‎10-01-2019

This is for Dr Fortran

Steve

In a posting on this topic you refer to https://software.intel.com/en-us/articles/linking-applications-using-vis...

I now need to consider an XP operating system (sigh!), and I can't find this reference. I would appreciate your help.

ChrisG