- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have a code test.exe and simply need to read the variables from a file called nuclear.dat, like below
mpiexec -n 6 test.exe < nuclear.dat
However it just hangs there. This issue seem only occur with Intel OneAPI and on Windows. The newest OneAPI 2023 version still have this issue.
But if I just do
test.exe < nuclear.dat
then it correctly output the results.
Below is the input file called nuclear.dat,
9002785287 # * irn. boss random number seed
12 # imode. 11 mean V fixed. 12 means full mixing, both V and k are flexable. CHoose 12 for now.
1000 # * itermax. max iteration number. Usually with stopping criterion code should stop before itermax.
.true. # * stop_criterion_on. Enabling stop criterion or not. .true. means yes enable.
30 # * LL_n. The number of continuous iterations for averaged slope. and then calculating the averaged parameters. currently just use this one number.
0.0 # * crit_1. The value of slope for stop criterion. The iteration begin 'smoothing' stage after averaged slope smaller than this value. Set to zero.
1 # * kmix. The gaussian mixing number.
.true. # read Yji or generate Yji. Currently set it as .true.
1000 # * mgauss_all. number of gaussian samples in E step.
5000000 # * m_all. Metropolos samples in M step.
30 # * i_init. This is for initial condition. Currently it is from 1 to 50 which is the number of subjects. Select the subject from simpar to start initial condition
simdata.csv # data.csv name
simpar.csv # parcsvname
The program is,
program main
use mympi
integer, parameter :: i4=selected_int_kind(9)
integer, parameter :: i8=selected_int_kind(15)
integer, parameter :: r8=selected_real_kind(15,9)
integer(kind=i8) :: irn,imode,itermax,kmix,mgauss_all,m_all,i_init
integer :: LL_n
logical :: stop_criterion_on,readYji
real(kind=r8) :: crit_1
character(len=50) :: csvname != 'simdata.csv'
character(len=50) :: parcsvname ! = 'simpar.csv'
call init0 !mpi initialization must be done before reading
if (myrank()==0) then
read (5,*) irn
read (5,*) imode
read (5,*) itermax
read (5,*) stop_criterion_on
read (5,*) LL_n
read (5,*) crit_1
read (5,*) kmix
read (5,*) readYji
read (5,*) mgauss_all
read (5,*) m_all
read (5,*) i_init
read (5,'(a50)') csvname
read (5,'(a50)') parcsvname
csvname = adjustl(csvname)
csvname = csvname(1:index(csvname,' ')-1) ! there should be space before # in the input file.
parcsvname = adjustl(parcsvname)
parcsvname = parcsvname(1:index(parcsvname,' ')-1)
write (6,'(''Boss random number seed ='',t30,i20)') irn
write (6,'(''imode ='',t30,i20)') imode
write (6,'(''iteration max # ='',t30,i20)') itermax
write (6,'(''Stop Criterion (SC) on ='',t40,l10)') stop_criterion_on
write (6,'(''SC # of averaged iterations ='',t30,i20)') LL_n
write (6,'(''SC stopping slope <'',t40,f10.5)') crit_1
write (6,'(''Mixing number ='',t30,i20)') kmix
write (6,'(''Read Yji ='',t40,l10)') readYji
write (6,'(''# Gauss samples for E step ='',t30,i20)') mgauss_all
write (6,'(''# Metropolis samples for M step ='',t30,i20)') m_all
write (6,'(''Initially from subject # '',t30,i20)') i_init
write (6,'(''data.csv file name ='',t30,a30)') csvname
write (6,'(''simpar.csv file name ='',t30,a30)') parcsvname
endif
end
My MPI module is
module mympi
use mpi
implicit none
integer, private, parameter :: i4=selected_int_kind(9)
integer, private, parameter :: i8=selected_int_kind(15)
integer, private, parameter :: r8=selected_real_kind(15,9)
integer, private, save :: mpii4,mpii8,mpir8
integer(kind=i4), private, save :: irank,iproc
contains
subroutine init0 ! call this before anything else
integer :: ierror,isize,ir,ip
integer(kind=i4) :: itest4
integer(kind=i8) :: itest8
real(kind=r8) :: rtest8
call mpi_init(ierror)
call mpi_comm_rank(mpi_comm_world,ir,ierror)
irank=ir
call mpi_comm_size(mpi_comm_world,ip,ierror)
iproc=ip
call mpi_sizeof(itest4,isize,ierror)
call mpi_type_match_size(mpi_typeclass_integer,isize,mpii4,ierror)
call mpi_sizeof(itest8,isize,ierror)
call mpi_type_match_size(mpi_typeclass_integer,isize,mpii8,ierror)
call mpi_sizeof(rtest8,isize,ierror)
call mpi_type_match_size(mpi_typeclass_real,isize,mpir8,ierror)
return
end subroutine init0
subroutine done ! wrapper for finalize routine
integer :: ierror
call mpi_finalize(ierror)
return
end subroutine done
function myrank() ! which process am I?
integer(kind=i4) :: myrank
myrank=irank
return
end function myrank
end module mympi
The whole solution file has been attached.
Could Intel please check this? Thanks much!
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Let's review
You run the code without MPI
doesn't this imply there is no problem with your code and the compiler?
With 6 MPI ranks it "hangs"
So what happens when you run 1 rank with mpiexec -n 1 ? That should be the first test.
I suspect your "hang" is that 6 copies of your program is exceeding the #cpus you have and/or the amount of RAM in your PC. Do you have 6 physical CPUs which will show up in WIndows as 12 "Processors" due to hyperthreading.
How much memory is needed for each instance of the program? Run memory monitoring and then launch 1, 2, 3 copies of your program. Are you exceeding your RAM?
Here's is how you can check your PC’s system resource usage with Task Manager.
- Press CTRL + Shift + Esc to open Task Manager.
- Click the Performance tab. This tab displays your system's RAM, CPU, GPU, and disk usage, along with network info.
- To view RAM usage, select the Memory box. That box provides info for how much RAM is in use and how much remains available.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Ron,
Thanks for the reply.
This minimal working example (MWE) code is extremely simple. As you can see, rank 0 read the variables from nuclear.dat, then write the variable values on the screen, that is all. I have also attached the whole VS sln file, everything is included. So I believe it is easy for you to reproduce this issue.
What you said is great to start, I appreciate it, but it is not the problem.
mpiexec -n XXX test.exe < nuclear.dat
No matter XXX is, 1 or >1, it just hangs.
It only cosumes 1.4MB on each of cpu core. My laptops all have >= 6 cores (so >= 12 threads).
However, without mpiexec, it works as expected,
test.exe < nuclear.dat
The output should be,
The strange thing only happens on Windows, and I use the latest version OneAPI 2023.0. Older versions of OneAPI on windows has the same problem.
Now, if you remove all the comments (you know those start with #) in the nuclear.dat, it works.
But I do not understand why with those comments in the nuclear.dat, it just hangs. However on Linux it works fine as always.
In fact, it is exactly because of this issue, I decided to use a namelist file instead of the nuclear.dat, as suggested by urbanjost,
https://fortran-lang.discourse.group/t/intel-fortran-cannot-read-from-a-long-line-properly/2933/5
however namelist has issue too, but has been solved by James and Barbara,
I may try putting things in a namelist file than the nuclear.dat.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
According to James, the batch file workaround behaves properly for this case. I am escalating this to our MPI team to work on.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you Barbara, hopefully this issue on Windows may be solved.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @CRquantum
sorry for the long silence, could you please double check if your files contain a new empty line at the end? I can trigger this bug only if I have no new line at the end of the file, regardless of the length.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
What happens if you trace the reads?
if (myrank()==0) then
print *,"read (5,*) irn"
read (5,*) irn
print *,"irn=",irn
...
As a work around, read the line into a large character variable, search for "#", if found, kill from there to end of line, then use internal read to convert from text to number. (Include the trace of the activity).
If that fails, then instead of using "< nuclear.dat" use "nuclear.dat" , use GET_COMMAND_ARGUMENT to fetch the file name, and then open the file for reading.
Jim Dempsey
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page