- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi
I would like to know if it is possible to read from a single data file from multiple processors simultaneously when using a parallel code.
Are the different ranks able to access the same file and read from this file independently and simultaneously ?
Thanks
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I would like to know if it is possible to read from a single data file from multiple processors simultaneously when using a parallel code.
Are the different ranks able to access the same file and read from this file independently and simultaneously ?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
As Tim and Steve suggest/reply, much depends on the behavior you require. As I see there are several different behaviors
a) Multiple processors means multiple processes (programs) each with access to same file using one thread within process.
b) One process running multiple threads, each thread sequentially reading records 1, 2, 3, ...
c) One process running multiple threads, each thread reading the next un-read recrord.
d) One process running multiple threads, each thread reading random records
and other variations on reads (e.g. ordered queue reading)
For a) the file has to be opened with appropriate SHARE
For b) each thread must use different UNIT and the file has to be opened with appropriate SHARE
For c) each thread must use same UNIT and the file has to be opened only once
For d) either using same or different units depending on if you want same file position pointer or different file position pointers.
When you have a large file that you want to sequentially read and process in parallel then generally you get better performance when you pipeline the process and use one thread to read the file into internal buffers which are then processed in parallel (one thread per buffer).
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have one MPI parallel program running simultaneously on different processors (each of which has its own memory) and I want each of them to be able to read randomly from one single file independently and simultaneously without sharing pointers.
I think that's option a) of Jim's suggestions repeated below:
a) Multiple processors means multiple processes (programs) each with access to same file using one thread within process.
b) One process running multiple threads, each thread sequentially reading records 1, 2, 3, ...
c) One process running multiple threads, each thread reading the next un-read recrord.
d) One process running multiple threads, each thread reading random records
I am dependent on open(.... access='stream' ...) since I'm not able to read the file in neither sequential nor direct access mode due to different behaviors in reading/writing unformatted files of the machine the file was written and the machine which should read the file. Further I want to jump to certain positions in the file an not necessarily read whole records but only parts of records.
Is it still possible to read from the same file simultaneously and independently from random positions ?
Thanks and have a nice weekend !
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
And what happens, if I have an machine architecture, where 8 processors in one node share their memory ?
So if I use 16 processors, for example, the machine uses at least 2 nodes and I have both processors which share the same memory and processors which don't.
How can I address my problem (see above) on such an architecture ?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The format of the file would make no difference; you should be able to open with both read (only) and stream options.
If you are using hybrid MPI/OpenMP, if more than one thread wants to read from the file, it might be done by opening with separate units, but not fully portably. It would likely be best to read the file to shared buffers in a serial region within each MPI process. Separate MPI processes will have no trouble keeping their buffers separate in read-only mode, but then there is no easy way to coordinate among processes or avoid extra latency due to each process reading the disk physically and forcing others to wait.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>>And what happens, if I have an machine architecture, where 8 processors in one node share their memory ?
If your application is MPI then you would be running 8 seperate processes on the 8 different processors. There would be no functional difference.
If you re-worked your application to use OpenMP to use one process with 8 threads then you could use 8 seperate UNITs, one for each thread. However, you might eek some additional performance by having each thread place read requests to an extra I/O thread. This I/O thread could then be optimized for your application. e.g. order the reads in elevator seeking pattern, or anticipate a peek and a gulp of data pattern.
Before you go through too much programming effort, it may be advisable to insturment the reads so you can record the patterns of activity on the file. You may discover that by re-organizing the file or splitting the file in two, or indexing the file, that you might attain significant performance improvement.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page