- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
A.
I am trying to do a distributed computing on two nodes using MPI and Julia as well as with the use of standalone Julia abilities.
I am able to do calculations on all 24 workers on one node in interactive and batch modes without any problems in pure Julia with
using Distributed
addprocs(24)
@everywhere ...
which is a basic functionality of Julia as described at [https://docs.julialang.org/en/v1/manual/distributed-computing/].
I am able to do basic MPI with the use of MPI.jl package [https://github.com/pressel/MPI.jl]:
using MPI
MPI.Init()
println("Hi from $(MPI.Comm_rank(MPI.COMM_WORLD))!")
flush(stdout)
mpirun -np 24 julia hello_world.jl.
However, I am not able to correctly add all 24 / 48 workers on two nodes. In theory I should be able to:
- add workers with the use of machine file when starting Julia
julia --machine-file=$PBS_NODEFILE
or
- with the use of MPIClustersManagers.jl [https://github.com/JuliaParallel/MPIClusterManagers.jl]
using Distributed
using MPIClusterManagers
# specify, number of mpi workers
manager=MPIManager(np=48)
# start mpi workers and add them as julia workers too.
addprocs(manager)
@everywhere import MPI
sleep(60.000) # provide time for workers to start
#Setup the worker environments
@everywhere using PackageName
#Solve with
@MPi_do manager begin
using MPI
experiment = Examples.experiments["name_of_experiment"]
session = Session(experiment, dir="/home/uxxxxx/data/xxxxx/xxxxx.jl/mytrainings/sessions/name_of_experiment")
resume!(session)
end
I have done a significant number of tries with different combinations described at various discussion lists, however I am unable to correctly launch workers on two nodes. The most common errors I see are as follow:
ERROR: TaskFailedException
nested task error: Unable to read host:port string from worker. Launch command exited with error?
[...]
caused by: Unable to read host:port string from worker. Launch command exited with error?
or/and
sh: 7: /etc/profile.d/add-local-path.sh: Syntax error: redirection unexpected
sh: 7: /etc/profile.d/add-local-path.sh: Syntax error: redirection unexpected
or
[mpiexec@s001-n047] HYD_hostfile_parse (../../../../../src/pm/i_hydra/libhydra/hostfile/hydra_hostfile.c:69): unable to open host file: -n
[mpiexec@s001-n047] mfile_fn (../../../../../src/pm/i_hydra/mpiexec/mpiexec_params.c:489): error parsing hostfile
[mpiexec@s001-n047] match_arg (../../../../../src/pm/i_hydra/libhydra/arg/hydra_arg.c:83): match handler returned error
[mpiexec@s001-n047] HYD_arg_parse_array (../../../../../src/pm/i_hydra/libhydra/arg/hydra_arg.c:128): argument matching returned error
[mpiexec@s001-n047] mpiexec_get_parameters (../../../../../src/pm/i_hydra/mpiexec/mpiexec_params.c:1356): error parsing input array
[mpiexec@s001-n047] main (../../../../../src/pm/i_hydra/mpiexec/mpiexec.c:1749): error parsing parameters
or
IOError: could not spawn setenv(`/home/uxxxxx/packages/julias/julia-1.6.1/bin/julia -Cnative -J/home/u77446/packages/julias/julia-1.6.1/lib/julia/sys.so -g1 --bind-to 127.0.0.1 --worker`; dir="/home/uxxxxx/data/xxxxx/xxxxx/mytrainings"): resource temporarily unavailable (EAGAIN)
Thus I would like to ask if any guidance from you on this topic would be possible. I would really appreciate any information.
B.
Also, heaving the opportunity, I would like to ask a question about gnome-terminal at renderkit machine. It used to work ok, however recently it does not want to start anymore. I tracked the error:
Error constructing proxy for org.gnome.Terminal:/org/gnome/Terminal/Factory0: Error calling StartServiceByName for org.gnome.Terminal: GDBus.Error:org.freedesktop.DBus.Error.Spawn.ChildExited: Process org.gnome.Terminal exited with status 8
I guess that the error is associated with broken Locale. Also the machine asks for a reboot.
To get the terminal working I have to execute LC_ALL=en_US.UTF-8 /usr/bin/dbus-launch gnome-terminal in XTerm.
Would it be possible to receive some guidance on this topic?
Best regards,
SZ
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thanks for posting in Intel forums. We are checking on this issue from our side. We will get back to you.
Thanks
Rahul
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks Rahul. One information that I think might be relevant and I would like to add is that I understand that in contrary to MPI, Julia is not natively supported here, however, what I am trying to do currently is to transform a quite extensive machine / reinforcement learning model (package) in a way that it would be able to utilize oneAPI and corresponding Intel software and hardware technologies. I understand that this kind of activities might be / is in line with Devcloud's policy thus those kind of questions. SZ
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
We are working on it and will get back to you soon.
Thanks & Regards
Shivani
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi!
Thanks. I'll really appreciate any information on this topic, especially about distributed computing. Also a very general, even preliminary information / assumption if it is doable at all or not would be useful.
Regards,
SZ
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello - thank you for your message. I will begin an investigation into this.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello. Thank you. Should you have any additional questions or may I be in any help please let me know.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello- For part B of this issue it would be best to post a thread in Render toolkit forum at
https://community.intel.com/t5/Intel-oneAPI-Rendering-Toolkit/bd-p/oneapi-rendering-toolkit
For part A- compile one of the test programs without Julia in $I_MPI_ROOT/test and run it with the same job layout with Julia. If that fails please repeat it I_MPI_DEBUG=16 and attach the log.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
Re: part B:
Thank you. I will.
Re: part A:
Thank you. Sure, I will compile test programs and I'll try to do tests in Julia. Please be advised that during this week I might spend less time at Devcloud than usual, however, please be assured that I will reply soon.
Best regards,
SZ

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page