- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am trying to get Abaqus running over an Omnipath fabric.
Abaqus version 6.14-1 and using Intel MPI 5.1.2
In my abaqus_v.env file I set mp_mpirun_options -v -genv I_MPI_FABRICS shm:tmi
By the way the -PSM2 argument is nt accepted
I cannot cut and paste the output here (Argggh!) so have attached a rather long output file.
I do not know where the wheels are coming off here. Pun intended as this is the e1 car crash simulation.
I got lots of messages about PMI buffer overrruns but I am not sure that is the root of the problem
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
-PSM2 flag was announced in Intel(R) MPI Library 5.1.3. Now we have released 2017 Update 2.
Can you test the issue with the latest released version of Intel(R) MPI Library?
--
Dmitry
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dmitry, thankyou.
I have realised this, and the Intel MPI is updated to version 2017.1.1
I am getting these messages now. I Can set I_MPI_DEBUG=2 if that is helpful.
Warning: string_to_uuid_array: wrong uuid format: 0)<8D>
Warning: string_to_uuid_array: correct uuid format is: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
Warning: string_to_uuid_array: wrong uuid format: 0)<8D>
Warning: string_to_uuid_array: correct uuid format is: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
gm-hpc-40.4380Driver initialization failure on /dev/hfi1 (err=23)
gm-hpc-41.54627hfi_userinit: mmap of rcvhdrq at dabbad0004030000 failed: Resource temporarily unavailable
Warning: string_to_uuid_array: wrong uuid format: 0)<8D>
Intel MPI variables are:
Hydra internal environment:
---------------------------
MPIR_CVAR_NEMESIS_ENABLE_CKPOINT=1
GFORTRAN_UNBUFFERED_PRECONNECTED=y
I_MPI_HYDRA_UUID=e7c00000-6c6a-3da2-a64b-0500c927ac10
DAPL_NETWORK_PROCESS_NUM=80
User set environment:
---------------------
I_MPI_FABRICS_LIST=tmi,dapl,tcp,ofa
I_MPI_TMI_PROVIDER=psm2
I_MPI_DEBUG=0
Intel(R) MPI Library specific variables:
----------------------------------------
I_MPI_PERHOST=allcores
I_MPI_COMPATIBILITY=4
I_MPI_ROOT=/cm/shared/apps/intel/compilers_and_libraries/2017.1.132/mpi
I_MPI_HYDRA_UUID=e7c00000-6c6a-3da2-a64b-0500c927ac10
I_MPI_FABRICS_LIST=tmi,dapl,tcp,ofa
I_MPI_TMI_PROVIDER=psm2
I_MPI_DEBUG=0
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Please try to set
export I_MPI_HYDRA_UUID=00000000-0000-0000-0001-000000000001
manually.
Also please check the needed steps from "Intel® Omni-Path Fabric Host Software User Guide" 13.3.5 Intel® Omni-Path HFI Initialization Failure:
$ lsmod | grep hfi
$ hfi1_control -iv
--
Dmitry
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
By the way, as this is a job runnign under Slurm I am aware of setting the max locked memory to unlimited,
and have implemented the slurm.conf configuration recommended here.
https://bugs.schedmd.com/show_bug.cgi?id=3363
If I run a test job and look at the limits, as in that thread, then the locked memory is unlimited,
Coudd it be the Abaqus launch script which is not inheriting this limit somehow?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dimitri,
this is definitely looking like a issue with the locked memory limit. I see the same as these guys
https://wiki.fysik.dtu.dk/niflheim/OmniPath#memory-limits
I am confused though - I applied the recommended Slurm configurations for the memory limits.
I guess that these limits are not being inherited properly by the Abaqus launch script somehow
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I should say though, I am NOT seeing the syslog messages which the Niflheim people say should be there.
So this could be another issue, sorry.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
About "Warning: string_to_uuid_array: wrong uuid format: 0)<8D>" I've filed the ticket to Intel MPI team.
--
Dmitry
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Something else to report here - please read this fully. I have access to two clusters with Omnipath.
I run a hello world MPI program, using two hosts, 1 process per host.
On Cluster A it runs fine.
On cluster B the error is printed (and actually the Hello World completes )
hfi devices are:
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
FYI - This problem is indicative that the wrong shared object libraries are being loaded and don't match up to the version of IMPI you are using
Warning: string_to_uuid_array: wrong uuid format:
Warning: string_to_uuid_array: correct uuid format is: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
I just went through this on an omnipath installation and had to remove libmpi.so.12 and libmpi_mt.so.12 from the Abaqus code/bin directories and force LD_LIBRARY_PATH to include the intel64/lib directory for the appropriate Intel MPI release.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page