OPEN: SHARED Specifier

milad_m_ · ‎06-20-2017

Hi

I have simple parallel code(added in below (VS 2013 and intel xe 2017)). When i run it for one rank (mpiexec -n 1 c.exe) there is no problem. But when run it with 2 ranks (in one node) an error appear for second rank which is about writing to READONLY file (text). But attributes of text file isn't read-only. In cmd i path to release folder that containing .exe file. what's wrong with the second rank? and is it right way to path for parallel running?

Thanks

jimdempseyatthecove · ‎06-21-2017

Add SHARED to your OPEN statement.

OPEN: SHARED Specifier

The SHARED specifier indicates that the file is connected for shared access by more than one program executing simultaneously. It takes the following form:

SHARED

On Linux* and OS X* systems, shared access is the default for the Fortran I/O system. On Windows* systems, it is the default if SHARED or compiler option fpscomp general is specified.

Note, the default differs between Linux and Windows

Jim Dempsey

View solution in original post

jimdempseyatthecove · ‎06-21-2017

Add SHARED to your OPEN statement.

OPEN: SHARED Specifier

The SHARED specifier indicates that the file is connected for shared access by more than one program executing simultaneously. It takes the following form:

SHARED

On Linux* and OS X* systems, shared access is the default for the Fortran I/O system. On Windows* systems, it is the default if SHARED or compiler option fpscomp general is specified.

Note, the default differs between Linux and Windows

Jim Dempsey

milad_m_ · ‎06-24-2017

Thanks Jim, It worked. Now i can run my main code as parallel in one PC.The main code is created to run in more than one node.I paste the above code (I had attached) to another PC in a same path in C drive. his drives are shared in both computers. 2 questions arose for me. I will be grateful if you could answer.

1- When i run a code with 2 core it takes more time compare with serial run . For example if my code counting from 1 to 9 millions in serial mode it takes less time compared with same calculation in parallel . My RAM and CPU usage in parallel running are about 50 and 60% ,respectively. Is it normal?

2- The parts are related to the second node finishing the calculation too late compared to the main node. As i say i shared C drives and insert both folders in a same locations in both PC's . Is it a right way to sharing or should be use some software or way to accelerate data transfer between nodes?

With regard.

jimdempseyatthecove · ‎06-26-2017

Shared files for output have different I/O requirements that will add overhead. In particular:

A non-shared file for output can buffer the writes until buffer size is met and at which point the buffer is written. Whereas:
A shared file, for every write, must lock the file, read the file adjacent to the record to be written (into a buffer), update the buffer, then immediately write the file, and then finally unlock the file.

Your program activity may be better served by having each rank write to different files, then on program exit, consolidate the rank files into single file.

If the data is to be shared by the ranks during runtime, then, remove the SHARED, assure that the owning rank opens for read/write .AND. deny writes. Then after each rank performs a WRITE to its writeable file, insert a FLUSH. And on the additional ranks, where you formerly READ a single file, you may have to loop through reads of the list of files for each rank (this may require a FLUSH on the READ should the data not be found, then re-read). This procedure avoids the locks and the read of buffer(s) containing adjacent data. You still have the more frequent writes, though you could flush every n'th write or after time interval..

An alternate method would be to use MPI messaging to have rank 0 perform all the reads and writes. Note, the node on which you currently run rank 0, can be specified to run two processes per node, and the other nodes one process per node. Thus adding an I/O rank to that system while keeping a node on that system for computation.

Jim Dempsey

jimdempseyatthecove · ‎06-26-2017

From your description in 2) it sounds like you can have each rank write directly to its own file, then at the point in your program where you interact with the data you can send/receive a small MPI message indicated you have finished writing your file (and performed flush or close).

Optimizing I/O or using other means in MPI is non-trivial and will take some thought (and experimentation).

Jim Dempsey

milad_m_ · ‎06-29-2017

Thanks for you're complete reply.

milad_m_ · ‎06-30-2017

Another question raised for me in cluster processing. In the text host file i insert just IP number and from this command determine total number of cores (all PC's) and number of cores in each nodes. mpiexec -n 4 -ppn 2 -f host file myprog.exe And format of the host file: 192.168.1.1 192.168.1.2 . . . But is there any way to specify number of cores in the host file? like this: node 1: 4 cores node 2 : 6 cores . . . Thanks.

jimdempseyatthecove · ‎06-30-2017

You could consider running 1 rank per node using MPI and then number of cores using OpenMP. This works quite well.

If you must run 1 rank per core, and number of cores vary then use something like:

mpiexec -n 2 -ppn 4 -f host file4cores myprog.exe : -n 4 -ppn 6 -f host file6cores myprog.exe

Where file4cores have two 4-core nodes, and file6cores have 4 6-core nodes

Note the : separating the differing command lines for mpiexec.

Of course, you must determine the number of nodes with 4 cores, and enter those node names into file4cores, and number of nodes with 6 cores, ...

There may be an easier way of doing this (Tim P. may be of help here). I'd like to think that there is some syntax like "-ppn cores" (hypothetical). Look at the command line options.

Note, there is an I_MPI_PERHOST=allcores environment variable that can be set that may do this, but I have no experience using this.

Jim Dempsey

Write to READONLY file error

OPEN: SHARED Specifier

OPEN: SHARED Specifier