- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I just wanted rank 0 cpu core read from the input namelist, and output the content of this namelist on the screen. However it seems Intel OneAPI on windows just cannot make it work. Is it some bug or something?
The minimal working example is extremely simple as below,
program test
use mympi
implicit none
integer :: irn
real :: hbar
namelist /mylist/ irn, hbar
call init0 ! mpi initialization
if (myrank() .eq. 0) then ! only rank 0 cpu core do the read and write.
read(5, nml=mylist)
write(6, nml=mylist)
write(6,*) 'irn = ', irn
write(6,*) 'hbar = ', hbar
endif
call done
end program
I place the test.exe file and the test.nml file in one folder.
If I run the program like below,
mpiexec -n 6 test.exe < test.nml
the program just hangs.
Now, if I just do,
test.exe < test.nml
then it works and the correct output is,
&MYLIST
IRN = 88888888,
HBAR = 20.73500
/
irn = 88888888
hbar = 20.73500
Does anyone know why?
Thanks much in advance!
----------------------------
PS.
The namelist file test.nml is,
&mylist
irn = 88888888 ! random number seed
hbar = 20.735 ! h bar
/
The mpi module is,
module mympi
use mpi
implicit none
integer, private, parameter :: i4=selected_int_kind(9)
integer, private, parameter :: i8=selected_int_kind(15)
integer, private, parameter :: r8=selected_real_kind(15,9)
integer, private, save :: mpii4,mpii8,mpir8
integer(kind=i4), private, save :: irank,iproc
contains
subroutine init0 ! call this before anything else
integer :: ierror,isize,ir,ip
integer(kind=i4) :: itest4
integer(kind=i8) :: itest8
real(kind=r8) :: rtest8
call mpi_init(ierror)
call mpi_comm_rank(mpi_comm_world,ir,ierror)
irank=ir
call mpi_comm_size(mpi_comm_world,ip,ierror)
iproc=ip
call mpi_sizeof(itest4,isize,ierror)
call mpi_type_match_size(mpi_typeclass_integer,isize,mpii4,ierror)
call mpi_sizeof(itest8,isize,ierror)
call mpi_type_match_size(mpi_typeclass_integer,isize,mpii8,ierror)
call mpi_sizeof(rtest8,isize,ierror)
call mpi_type_match_size(mpi_typeclass_real,isize,mpir8,ierror)
return
end subroutine init0
subroutine done ! wrapper for finalize routine
integer :: ierror
call mpi_finalize(ierror)
return
end subroutine done
function myrank() ! which process am I?
integer(kind=i4) :: myrank
myrank=irank
return
end function myrank
end module mympi
For convenience, I uploaded all the VS solution file and all the f90 files and the test.nml file in the attachment.
Another relavant link is below, similar problem. With MPI the read just does not work correct.
https://fortran-lang.discourse.group/t/intel-fortran-cannot-read-from-a-long-line-properly/2933/3
I posted here too,
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I tested the workaround with multiple ranks, it does work as expected. I also have another workaround. Add a newline to the end of your test.nml file.
To be clear, this is definitely not a Fortran issue. This is a matter of how Windows is handling redirects for standard input.
When you run mpiexec, it first parses through the arguments. In this parsing, it identifies key options for mpiexec and the application (plus arguments) you have specified. It uses the mpiexec options to define how to launch the application, and launches the application (along with any arguments). Normally, standard input is redirected to your application. If you need to type anything in, it will be passed to the application at that time.
In the Windows command prompt, redirecting standard input from a file with "< file" tells Windows that you want standard input to come from this file. This will only allow the specified file to act as standard input. That file will be the equivalent of you typing in the contents of the file. In this case, the last thing in your file is the "/" ending the namelist, but not a newline (equivalent to pressing enter) to complete that line of input.
The reason this works outside of the MPI environment is that when redirecting standard input, mpiexec is receiving the specified input and has to pass it through to the application. In this passthrough, the implicit end of input from the end of file does not make it through, causing your application to hang.
The reason the batch file method works is that it embeds the redirect to be directly to the application, rather than to mpiexec. If you create a batch file with "echo %*", it will show you all of the arguments passed. Run that batch file with a redirect, and you'll notice that the redirect is not an argument. Windows parses it before the batch file is started, and the batch file never sees it.
It is possible that this could be fixed in mpiexec. However, since adding a newline at the end of your input file is a sufficient workaround, and this is likely a fairly complicated fix, I do not expect it will be addressed.
Please confirm that this workaround is functional for you.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
What version of MPI and Fortran are you using?
I just tried your reproducer on Linux with ifort 2021.8.0 and Intel(R) MPI Library 2021.8 for Linux* . It ran just fine with 6 ranks.
A workaround for Windows would be "mpiexec -n 1 runme.bat" with "test.exe < test.nml" in runme.bat.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you Barbara. Yes the Linux version of Intel OneAPI or the previous parallel studio have no problem.
This issue only happen on Windows. I am using the most current version of OneAPI as of now, 2023.0 I think. But it seems all the OneAPI has this issue on Windows.
About the workaround,
@Barbara_P_Intel wrote:
What version of MPI and Fortran are you using?
I just tried your reproducer on Linux with ifort 2021.8.0 and Intel(R) MPI Library 2021.8 for Linux* . It ran just fine with 6 ranks.
A workaround for Windows would be "mpiexec -n 1 runme.bat" with "test.exe < test.nml" in runme.bat.
Thank you again Barbara. Yeah the mpiexec -n 1 works. But this seems is same as just do
`test.exe < test.nml` in the command line without even type `mpiexec -n 1` in front of it.
How about for more than one core? Like
"mpiexec -n 4 runme.bat" with "test.exe < test.nml" in runme.bat.
In this case it seems each of the 4 cores inpendently run the runme.bat. This is probably not exactly what I want.
Eventually what I want is, you know, many cores are involved, and only rank 0 core readin the test.nml file first, then it broadcast the variables in test.nml to all the other cores. So probably we still need things like
mpiexec -n 6 test.exe < test.nml
I guess.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm on our MPI support team and investigating this. The workaround Barbara mentioned should help you get started. The issue appears to be between MPI and Windows, with the input redirection not happening correctly. If I can find another workaround, I will let you know.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you James.
Yeah, it is for MPI and windows only.
In fact it is not just for read in the namelist, I have another post actually about the same issue (again on Linux Intel OneAPI is fine),
That is if the read in file, say file.dat, has long comments, such as
9002785287 # * irn. boss random number seed
12 # imode. 11 mean V fixed. 12 means full mixing, both V and k are flexable. CHoose 12 for now.
1000 # * itermax. max iteration number. Usually with stopping criterion code should stop before itermax.
.true. # * stop_criterion_on. Enabling stop criterion or not. .true. means yes enable.
30 # * LL_n. The number of continuous iterations for averaged slope. and then calculating the averaged parameters. currently just use this one number.
0.0 # * crit_1. The value of slope for stop criterion. The iteration begin 'smoothing' stage after averaged slope smaller than this value. Set to zero.
1 # * kmix. The gaussian mixing number.
.true. # read Yji or generate Yji. Currently set it as .true.
1000 # * mgauss_all. number of gaussian samples in E step.
5000000 # * m_all. Metropolos samples in M step.
30 # * i_init. This is for initial condition. Currently it is from 1 to 50 which is the number of subjects. Select the subject from simpar to start initial condition
simdata.csv # data.csv name
simpar.csv # parcsvname
If my code is just let rank 0 read in the variables values in the file and print the result as below,
program main
use mympi
integer, parameter :: i4=selected_int_kind(9)
integer, parameter :: i8=selected_int_kind(15)
integer, parameter :: r8=selected_real_kind(15,9)
integer(kind=i8) :: irn,imode,itermax,kmix,mgauss_all,m_all,i_init
integer :: LL_n
logical :: stop_criterion_on,readYji
real(kind=r8) :: crit_1
character(len=50) :: csvname != 'simdata.csv'
character(len=50) :: parcsvname ! = 'simpar.csv'
call init0 !mpi initialization must be done before reading
if (myrank()==0) then
read (5,*) irn
read (5,*) imode
read (5,*) itermax
read (5,*) stop_criterion_on
read (5,*) LL_n
read (5,*) crit_1
read (5,*) kmix
read (5,*) readYji
read (5,*) mgauss_all
read (5,*) m_all
read (5,*) i_init
read (5,'(a50)') csvname
read (5,'(a50)') parcsvname
csvname = adjustl(csvname)
csvname = csvname(1:index(csvname,' ')-1) ! there should be space before # in the input file.
parcsvname = adjustl(parcsvname)
parcsvname = parcsvname(1:index(parcsvname,' ')-1)
write (6,'(''Boss random number seed ='',t30,i20)') irn
write (6,'(''imode ='',t30,i20)') imode
write (6,'(''iteration max # ='',t30,i20)') itermax
write (6,'(''Stop Criterion (SC) on ='',t40,l10)') stop_criterion_on
write (6,'(''SC # of averaged iterations ='',t30,i20)') LL_n
write (6,'(''SC stopping slope <'',t40,f10.5)') crit_1
write (6,'(''Mixing number ='',t30,i20)') kmix
write (6,'(''Read Yji ='',t40,l10)') readYji
write (6,'(''# Gauss samples for E step ='',t30,i20)') mgauss_all
write (6,'(''# Metropolis samples for M step ='',t30,i20)') m_all
write (6,'(''Initially from subject # '',t30,i20)') i_init
write (6,'(''data.csv file name ='',t30,a30)') csvname
write (6,'(''simpar.csv file name ='',t30,a30)') parcsvname
endif
end
with MPI, again If I do
mpiexec -n 6 test.exe < file.dat
It hangs.
Barbara's workaround is great for one core, but I am not very sure if it works for more than one core as I mentioned in the reply post above.
Another workaround, is just in the code, we specify
open(action='read', file='file.dat')
Then just do
mpiexec -n 6 test.exe
Then rank 0 can open the file.dat and do read correctly it seems.
Anyway, I am glad Intel noticed this is an issue on Windows. I wish it could be solved in the next release.
Thank you very much indeed!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I tested the workaround with multiple ranks, it does work as expected. I also have another workaround. Add a newline to the end of your test.nml file.
To be clear, this is definitely not a Fortran issue. This is a matter of how Windows is handling redirects for standard input.
When you run mpiexec, it first parses through the arguments. In this parsing, it identifies key options for mpiexec and the application (plus arguments) you have specified. It uses the mpiexec options to define how to launch the application, and launches the application (along with any arguments). Normally, standard input is redirected to your application. If you need to type anything in, it will be passed to the application at that time.
In the Windows command prompt, redirecting standard input from a file with "< file" tells Windows that you want standard input to come from this file. This will only allow the specified file to act as standard input. That file will be the equivalent of you typing in the contents of the file. In this case, the last thing in your file is the "/" ending the namelist, but not a newline (equivalent to pressing enter) to complete that line of input.
The reason this works outside of the MPI environment is that when redirecting standard input, mpiexec is receiving the specified input and has to pass it through to the application. In this passthrough, the implicit end of input from the end of file does not make it through, causing your application to hang.
The reason the batch file method works is that it embeds the redirect to be directly to the application, rather than to mpiexec. If you create a batch file with "echo %*", it will show you all of the arguments passed. Run that batch file with a redirect, and you'll notice that the redirect is not an argument. Windows parses it before the batch file is started, and the batch file never sees it.
It is possible that this could be fixed in mpiexec. However, since adding a newline at the end of your input file is a sufficient workaround, and this is likely a fairly complicated fix, I do not expect it will be addressed.
Please confirm that this workaround is functional for you.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you James.
Yeah I confirm the "newline" trick works.
Actually thank you for reminding me the this "newline" trick.
I actually figured out this "newline" trick before, now with your explanation I know why this newline trick works. Again thank you so much.
Now, thing is, even if you do this "newline" trick, sometimes it still does not work. You can see my reply in this post on
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Since we have multiple workarounds and this is unlikely to receive a fix, I am closing this for Intel support. Any further replies will be considered community only.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page