Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
2159 Discussions

Using Intel MPI on Windows HPC cluster manager

gert_massa
Beginner
479 Views
Hi All,

In the Intel MPI manual it says it is supported on Windows HPC am unable to run my MPI application using the HPC cluster nanager.

My observations:

- When I select the resources based on number of nodes equal to 1 but not using mpi (ie. running my application sequencially) it works fine.
- When I select the resources based on cores but not using mpi (ie. running my application sequencially) it only works when I specify to number of cores equal to the physical cores of my system. Ie. on a quad core it works when I secify 4 core but doesn't work for 1, 2 or 3
- when I want to user mpi and I add mpiexec to my command i get the folowing error
[01:4792]....ERROR:unable to read the cmd header on the pmi context, Error = -1

Does anyone have some experience with Intel mpi and windows HPC?

Thanks in advance!
0 Kudos
2 Replies
Dmitry_K_Intel2
Employee
479 Views
Hi Gert,

I think that we need more information about your environment and your task.

First of all: what version of the Intel MPI Library do you use?
Have you tried to run HelloWorld example from the test directory of the Intel MPI installation?

Does smpd service work on your nodes? Please run 'smpd -status .

What is your application and how you run it?

Let's start from this information and I hope that we will be able to help you.

Regards!
Dmitry
0 Kudos
gert_massa
Beginner
479 Views
Hi Dmitry,

I've downloaded the last MPI version 4.0.0.012 but still have the same problem as I had with version 3.2.1.009. I am able to run a small test apllication I wrote myself but my real application needs to be lauched from a bat file (setting some environment variables) which might cause some problems. Although doing the same thing using the microsoft mpi works fine. I have a contact at microsoft which helped me to find the cause of the problem but he found out there is a crash happening in impi.dll because of a division by zero.

Strangely I do not need to run my application in parallel to have this error. Ie not using mpiexec.

See below the stack trace

(a1c.6b4): Integer divide-by-zero - code c0000094 (first chance)
First chance exceptions are reported before any exception handling.
This exception may be expected and handled.
*** ERROR: Symbol file could not be found. Defaulted to export symbols for \\head08\sysnoise\SSN56_WIN64\5.6\bin\Windows_x86_64\impi.dll -
impi!PMPI_Initialized+0x32b1:
00000000`053f2e59 41f7f0 div eax,r8d
[HEAD08\Administrator (npipe dbg)] 0:000> kb
RetAddr : Args to Child : Call Site
00000000`053f3372 : 00000000`06221c80 00000000`0012ff38 00000000`000002d0 00000000`0012ff38 : impi!PMPI_Initialized+0x32b1
00000000`053f5a1b : 00000000`00000005 00000000`00008006 00000000`0012ff30 00000000`0012ff30 : impi!PMPI_Initialized+0x37ca
00000000`053efb9d : 00000000`0012fb00 00000000`00000000 00000000`0000000b 00000000`75d4bb5d : impi!PMPI_Init_thread+0x3ff
*** WARNING: Unable to verify checksum for image00000000`00400000
*** ERROR: Module load completed but symbols could not be loaded for image00000000`00400000
00000000`01a25822 : 00000000`00000000 00000000`012cb3c8 00000000`00130000 00000000`00130000 : impi!PMPI_Init+0x89
00000000`012c9ff0 : 00000200`002b0000 00000000`0114a5d7 00000000`00000000 00000000`00000000 : image00000000_00400000+0x1625822
00000000`0040107f : 00000000`01a25cb0 000007ff`fffdf000 00000000`00000000 00000000`0012ffd8 : image00000000_00400000+0xec9ff0
00000000`01a25b30 : 00000000`00000006 00000000`06422280 00000000`00000000 00000000`00000000 : image00000000_00400000+0x107f
00000000`77b1466d : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : image00000000_00400000+0x1625b30
00000000`77c48791 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : kernel32!BaseThreadInitThunk+0xd
00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : ntdll!RtlUserThreadStart+0x1d
[HEAD08\Administrator (npipe dbg)] 0:000> .frame 4
04 00000000`0012fb10 00000000`012c9ff0 image00000000_00400000+0x1625822
[HEAD08\Administrator (npipe dbg)] 0:000> .frame 3
03 00000000`0012fab0 00000000`01a25822 impi!PMPI_Init+0x89
[HEAD08\Administrator (npipe dbg)] 0:000> r
rax=0000000000000000 rbx=0000000000000001 rcx=0000000000000001
rdx=0000000000000000 rsi=0000000000000000 rdi=0000000006271110
rip=00000000053f2e59 rsp=000000000012f900 rbp=0000000000000000
r8=0000000000000000 r9=0000000000000002 r10=0000000000000002
r11=0000000006271130 r12=00000000055a0780 r13=00000000055a0780
r14=0000000000000002 r15=0000000006271118
iopl=0 nv up ei pl zr na po nc
cs=0033 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00010246
impi!PMPI_Initialized+0x32b1:
00000000`053f2e59 41f7f0 div eax,r8d
0 Kudos
Reply