Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
29387 Discussions

Read from a single data file with multiple processors

mad-matts
Beginner
4,620 Views

Hi

I would like to know if it is possible to read from a single data file from multiple processors simultaneously when using a parallel code.

Are the different ranks able to access the same file and read from this file independently and simultaneously ?

Thanks

0 Kudos
8 Replies
TimP
Honored Contributor III
4,621 Views
Quoting mad-matts

I would like to know if it is possible to read from a single data file from multiple processors simultaneously when using a parallel code.

Are the different ranks able to access the same file and read from this file independently and simultaneously ?

Probably, if all open it with open(access='read',....)
0 Kudos
Steven_L_Intel1
Employee
4,621 Views
Are you expecting all the threads to coordinate access to the file so that records are read in order, or is each thread to be able to read the whole file without concern of other threads? If the latter, then Tim's suggestion is appropriate. If the former, you'll need to add a synchronization primitive such as a mutex or critical section around the reads.
0 Kudos
jimdempseyatthecove
Honored Contributor III
4,621 Views

As Tim and Steve suggest/reply, much depends on the behavior you require. As I see there are several different behaviors

a) Multiple processors means multiple processes (programs) each with access to same file using one thread within process.

b) One process running multiple threads, each thread sequentially reading records 1, 2, 3, ...

c) One process running multiple threads, each thread reading the next un-read recrord.

d) One process running multiple threads, each thread reading random records

and other variations on reads (e.g. ordered queue reading)

For a) the file has to be opened with appropriate SHARE

For b) each thread must use different UNIT and the file has to be opened with appropriate SHARE

For c) each thread must use same UNIT and the file has to be opened only once

For d) either using same or different units depending on if you want same file position pointer or different file position pointers.

When you have a large file that you want to sequentially read and process in parallel then generally you get better performance when you pipeline the process and use one thread to read the file into internal buffers which are then processed in parallel (one thread per buffer).

Jim Dempsey

0 Kudos
mad-matts
Beginner
4,621 Views

I have one MPI parallel program running simultaneously on different processors (each of which has its own memory) and I want each of them to be able to read randomly from one single file independently and simultaneously without sharing pointers.

I think that's option a) of Jim's suggestions repeated below:

a) Multiple processors means multiple processes (programs) each with access to same file using one thread within process.

b) One process running multiple threads, each thread sequentially reading records 1, 2, 3, ...

c) One process running multiple threads, each thread reading the next un-read recrord.

d) One process running multiple threads, each thread reading random records

I am dependent on open(.... access='stream' ...) since I'm not able to read the file in neither sequential nor direct access mode due to different behaviors in reading/writing unformatted files of the machine the file was written and the machine which should read the file. Further I want to jump to certain positions in the file an not necessarily read whole records but only parts of records.

Is it still possible to read from the same file simultaneously and independently from random positions ?

Thanks and have a nice weekend !

0 Kudos
mad-matts
Beginner
4,621 Views

And what happens, if I have an machine architecture, where 8 processors in one node share their memory ?

So if I use 16 processors, for example, the machine uses at least 2 nodes and I have both processors which share the same memory and processors which don't.

How can I address my problem (see above) on such an architecture ?

0 Kudos
TimP
Honored Contributor III
4,622 Views

The format of the file would make no difference; you should be able to open with both read (only) and stream options.

If you are using hybrid MPI/OpenMP, if more than one thread wants to read from the file, it might be done by opening with separate units, but not fully portably. It would likely be best to read the file to shared buffers in a serial region within each MPI process. Separate MPI processes will have no trouble keeping their buffers separate in read-only mode, but then there is no easy way to coordinate among processes or avoid extra latency due to each process reading the disk physically and forcing others to wait.

0 Kudos
jimdempseyatthecove
Honored Contributor III
4,622 Views

>>And what happens, if I have an machine architecture, where 8 processors in one node share their memory ?

If your application is MPI then you would be running 8 seperate processes on the 8 different processors. There would be no functional difference.

If you re-worked your application to use OpenMP to use one process with 8 threads then you could use 8 seperate UNITs, one for each thread. However, you might eek some additional performance by having each thread place read requests to an extra I/O thread. This I/O thread could then be optimized for your application. e.g. order the reads in elevator seeking pattern, or anticipate a peek and a gulp of data pattern.

Before you go through too much programming effort, it may be advisable to insturment the reads so you can record the patterns of activity on the file. You may discover that by re-organizing the file or splitting the file in two, or indexing the file, that you might attain significant performance improvement.

Jim Dempsey

0 Kudos
mad-matts
Beginner
4,622 Views
OK. Thank you all for your advise !!!
0 Kudos
Reply