Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.

Intel Cluster Checker errors

Chris_H_
Beginner
318 Views

Hi I'm testing a small culster of 8 nodes with Intel Cluster Checker and I get Fails which do not really explain the issue:

Basic network connectivity, (ping).....................................................................................................................Failed
    [010100] subtest 'ping request delay is less than 100 ms' passed
      node: hadoop5: 0.059 ms
      node: hadoop4: 0.070 ms
      node: hadoop3: 0.075 ms
      node: hadoop2: 0.082 ms
    [010000] subtest 'shall contain at least 4 compute nodes per group' passed
      node: hadoop2: 4 computes
    [010301] subtest 'shall contain at least a head node' failed
      node: hadoop2: 0 head
Node remote connectivity, (remote_login)..............................................................................................................Skipped
    failed dependencies: ping


My nodes list is the following:

hadoop1 # type: head
hadoop2
hadoop3
hadoop4
hadoop5

Also I get the following error from the mpi_local test:

Intel(R) MPI Library intranode runtime, (mpi_local)....................................................................................................Failed
    [160201] subtest 'mpi runtime' failed
      node: hadoop[2-5]: no mpirun output for device shm

I got the same when I defined the device as tcp in an xml file.

Any help would be appreciated.

Thanks,

Chris

0 Kudos
0 Replies
Reply