- Marquer comme nouveau
- Marquer
- S'abonner
- Sourdine
- S'abonner au fil RSS
- Surligner
- Imprimer
- Signaler un contenu inapproprié
I am trying to get Abaqus running over an Omnipath fabric.
Abaqus version 6.14-1 and using Intel MPI 5.1.2
In my abaqus_v.env file I set mp_mpirun_options -v -genv I_MPI_FABRICS shm:tmi
By the way the -PSM2 argument is nt accepted
I cannot cut and paste the output here (Argggh!) so have attached a rather long output file.
I do not know where the wheels are coming off here. Pun intended as this is the e1 car crash simulation.
I got lots of messages about PMI buffer overrruns but I am not sure that is the root of the problem
Lien copié
- Marquer comme nouveau
- Marquer
- S'abonner
- Sourdine
- S'abonner au fil RSS
- Surligner
- Imprimer
- Signaler un contenu inapproprié
Hi,
-PSM2 flag was announced in Intel(R) MPI Library 5.1.3. Now we have released 2017 Update 2.
Can you test the issue with the latest released version of Intel(R) MPI Library?
--
Dmitry
- Marquer comme nouveau
- Marquer
- S'abonner
- Sourdine
- S'abonner au fil RSS
- Surligner
- Imprimer
- Signaler un contenu inapproprié
Dmitry, thankyou.
I have realised this, and the Intel MPI is updated to version 2017.1.1
I am getting these messages now. I Can set I_MPI_DEBUG=2 if that is helpful.
Warning: string_to_uuid_array: wrong uuid format: 0)<8D>
Warning: string_to_uuid_array: correct uuid format is: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
Warning: string_to_uuid_array: wrong uuid format: 0)<8D>
Warning: string_to_uuid_array: correct uuid format is: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
gm-hpc-40.4380Driver initialization failure on /dev/hfi1 (err=23)
gm-hpc-41.54627hfi_userinit: mmap of rcvhdrq at dabbad0004030000 failed: Resource temporarily unavailable
Warning: string_to_uuid_array: wrong uuid format: 0)<8D>
Intel MPI variables are:
Hydra internal environment:
---------------------------
MPIR_CVAR_NEMESIS_ENABLE_CKPOINT=1
GFORTRAN_UNBUFFERED_PRECONNECTED=y
I_MPI_HYDRA_UUID=e7c00000-6c6a-3da2-a64b-0500c927ac10
DAPL_NETWORK_PROCESS_NUM=80
User set environment:
---------------------
I_MPI_FABRICS_LIST=tmi,dapl,tcp,ofa
I_MPI_TMI_PROVIDER=psm2
I_MPI_DEBUG=0
Intel(R) MPI Library specific variables:
----------------------------------------
I_MPI_PERHOST=allcores
I_MPI_COMPATIBILITY=4
I_MPI_ROOT=/cm/shared/apps/intel/compilers_and_libraries/2017.1.132/mpi
I_MPI_HYDRA_UUID=e7c00000-6c6a-3da2-a64b-0500c927ac10
I_MPI_FABRICS_LIST=tmi,dapl,tcp,ofa
I_MPI_TMI_PROVIDER=psm2
I_MPI_DEBUG=0
- Marquer comme nouveau
- Marquer
- S'abonner
- Sourdine
- S'abonner au fil RSS
- Surligner
- Imprimer
- Signaler un contenu inapproprié
Please try to set
export I_MPI_HYDRA_UUID=00000000-0000-0000-0001-000000000001
manually.
Also please check the needed steps from "Intel® Omni-Path Fabric Host Software User Guide" 13.3.5 Intel® Omni-Path HFI Initialization Failure:
$ lsmod | grep hfi
$ hfi1_control -iv
--
Dmitry
- Marquer comme nouveau
- Marquer
- S'abonner
- Sourdine
- S'abonner au fil RSS
- Surligner
- Imprimer
- Signaler un contenu inapproprié
- Marquer comme nouveau
- Marquer
- S'abonner
- Sourdine
- S'abonner au fil RSS
- Surligner
- Imprimer
- Signaler un contenu inapproprié
- Marquer comme nouveau
- Marquer
- S'abonner
- Sourdine
- S'abonner au fil RSS
- Surligner
- Imprimer
- Signaler un contenu inapproprié
By the way, as this is a job runnign under Slurm I am aware of setting the max locked memory to unlimited,
and have implemented the slurm.conf configuration recommended here.
https://bugs.schedmd.com/show_bug.cgi?id=3363
If I run a test job and look at the limits, as in that thread, then the locked memory is unlimited,
Coudd it be the Abaqus launch script which is not inheriting this limit somehow?
- Marquer comme nouveau
- Marquer
- S'abonner
- Sourdine
- S'abonner au fil RSS
- Surligner
- Imprimer
- Signaler un contenu inapproprié
Dimitri,
this is definitely looking like a issue with the locked memory limit. I see the same as these guys
https://wiki.fysik.dtu.dk/niflheim/OmniPath#memory-limits
I am confused though - I applied the recommended Slurm configurations for the memory limits.
I guess that these limits are not being inherited properly by the Abaqus launch script somehow
- Marquer comme nouveau
- Marquer
- S'abonner
- Sourdine
- S'abonner au fil RSS
- Surligner
- Imprimer
- Signaler un contenu inapproprié
I should say though, I am NOT seeing the syslog messages which the Niflheim people say should be there.
So this could be another issue, sorry.
- Marquer comme nouveau
- Marquer
- S'abonner
- Sourdine
- S'abonner au fil RSS
- Surligner
- Imprimer
- Signaler un contenu inapproprié
About "Warning: string_to_uuid_array: wrong uuid format: 0)<8D>" I've filed the ticket to Intel MPI team.
--
Dmitry
- Marquer comme nouveau
- Marquer
- S'abonner
- Sourdine
- S'abonner au fil RSS
- Surligner
- Imprimer
- Signaler un contenu inapproprié
Something else to report here - please read this fully. I have access to two clusters with Omnipath.
I run a hello world MPI program, using two hosts, 1 process per host.
On Cluster A it runs fine.
On cluster B the error is printed (and actually the Hello World completes )
hfi devices are:
- Marquer comme nouveau
- Marquer
- S'abonner
- Sourdine
- S'abonner au fil RSS
- Surligner
- Imprimer
- Signaler un contenu inapproprié
FYI - This problem is indicative that the wrong shared object libraries are being loaded and don't match up to the version of IMPI you are using
Warning: string_to_uuid_array: wrong uuid format:
Warning: string_to_uuid_array: correct uuid format is: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
I just went through this on an omnipath installation and had to remove libmpi.so.12 and libmpi_mt.so.12 from the Abaqus code/bin directories and force LD_LIBRARY_PATH to include the intel64/lib directory for the appropriate Intel MPI release.

- S'abonner au fil RSS
- Marquer le sujet comme nouveau
- Marquer le sujet comme lu
- Placer ce Sujet en tête de liste pour l'utilisateur actuel
- Marquer
- S'abonner
- Page imprimable