Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
2211 Discussions

Weird result when running an app on two host

oleotiger
Novice
2,425 Views

Environment:

I have 2 host:

A:  4 sockets with Intel(R) Xeon(R) Platinum 8365HC CPU @ 3.00GHz and SNC off.

B:  2 sockets with Intel(R) Xeon(R) Gold 6248R CPU @ 3.00GHz and SNC on.

CPUs on both hosts have the same flags. The OS version is the same:CentOS Linux release 7.6.1810. Intel oneapi 2021.2 is installed on both hosts.

Host A and B are mounted with a NFS. Applications are installed on the NFS.

Application:

The application I work with is an open-source HPC application lammps (version 30Nov29).

How I compiled:

cd src
make yes-MOLECULE yes-USER-REAXC yes-PERI yes-USER-EFF yes-MANYBODY yes-GRANULAR yes-KSPACE yes-RIGID yes-opt
make yes-user-omp yes-user-intel
make -j intel_cpu_intelmpi

How to run:

./lmp_intel_cpu_intelmpi -h

Problems:

I compiled lammps on both host A (named lammps_A )and host B (named lammps_B).

lammps_A and lammps_B are in the NFS so I can run them on each host.

Here is the problem:

When I run lammps_A and lammps_B on host A, I got 'Segmentation fault '.

When I run lammps_A and lammps_B on host B, the application works well and I can get the correct output.

This is such a weird result. I have no idea why there are different output.

Any ideas ? 

How to debug it or possible resons are both appreciated.

Thank you.

0 Kudos
1 Solution
ShivaniK_Intel
Moderator
2,310 Views

Hi,


>>>"The key problem is not mpi. I think there is something wrong with the application lammps".


As you said this issue is specific to application lammps, Could you please let us know if there is anything else that we can help you with? If no, could you please confirm whether we can close this thread?


Thanks & Regards

Shivani


View solution in original post

0 Kudos
7 Replies
ShivaniK_Intel
Moderator
2,383 Views

Hi,


Thanks for reaching out to us.


We see that you are running an application on NFS, so there is no need to compile the application on both host A and host B. Compiling the application on a single host will be enough as NFS files are shared.


Also, check the below link for running the application using mpirun:

https://lammps.sandia.gov/doc/Run_basics.html


Please let us know if you face any issues.


Thanks & Regards

Shivani


0 Kudos
oleotiger
Novice
2,372 Views

Sorry, but I think you didn't get my question.

The problem is that: the compiled binary apps act differently.

When I run lammps_A and lammps_B on host A, I got 'Segmentation fault '.

When I run lammps_A and lammps_B on host B, the application works well and I can get the correct output.

 

Both apps are compiled completely with no error. 

 

I want to find how to run lammps correctly on host A as neither binary compiled on hostA nor on host B  works.

 

0 Kudos
ShivaniK_Intel
Moderator
2,357 Views

Hi,


Can you please try with these commands on host A


mpirun -np 4 lmp_mpi -in in.file

mpirun -np 8 /path/to/lammps/src/lmp_mpi -in in.file

mpirun -np 6 /usr/local/bin/lmp -in in.file


For more information on the basics of running LAMMPS please refer to the below link.


https://lammps.sandia.gov/doc/Run_basics.html


If your issue still persists,can you please provide the complete log using the below command:


I_MPI_DEBUG=10 mpirun -np 6 /usr/local/bin/lmp -in in.file


Thanks & Regards

Shivani


0 Kudos
oleotiger
Novice
2,351 Views
[0] MPI startup(): Intel(R) MPI Library, Version 2021.2  Build 20210302 (id: f4f7c92cd)
[0] MPI startup(): Copyright (C) 2003-2021 Intel Corporation.  All rights reserved.
[0] MPI startup(): library kind: release
[0] MPI startup(): shm segment size (796 MB per rank) * (6 local ranks) = 4777 MB total
[0] MPI startup(): libfabric version: 1.11.0-impi
[0] MPI startup(): libfabric provider: verbs;ofi_rxm

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 0 PID 3352 RUNNING AT node33
=   KILLED BY SIGNAL: 9 (Killed)
===================================================================================

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 1 PID 3353 RUNNING AT node33
=   KILLED BY SIGNAL: 9 (Killed)
===================================================================================

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 2 PID 3354 RUNNING AT node33
=   KILLED BY SIGNAL: 9 (Killed)
===================================================================================

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 3 PID 3355 RUNNING AT node33
=   KILLED BY SIGNAL: 11 (Segmentation fault)
===================================================================================

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 4 PID 3356 RUNNING AT node33
=   KILLED BY SIGNAL: 9 (Killed)
===================================================================================

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 5 PID 3357 RUNNING AT node33
=   KILLED BY SIGNAL: 9 (Killed)
===================================================================================

 

The key problem is not mpi. I think there is something wrong with the application lmp.  But it can run normally on host B.

I doubt whether different CPU type on host A and host B is the reason.

0 Kudos
ShivaniK_Intel
Moderator
2,311 Views

Hi,


>>>"The key problem is not mpi. I think there is something wrong with the application lammps".


As you said this issue is specific to application lammps, Could you please let us know if there is anything else that we can help you with? If no, could you please confirm whether we can close this thread?


Thanks & Regards

Shivani


0 Kudos
oleotiger
Novice
2,292 Views

It's OK to close this thread.

0 Kudos
ShivaniK_Intel
Moderator
2,270 Views

Hi,


Thanks for the confirmation!


We will no longer respond to this thread. If you require any additional assistance from Intel, please start a new thread. Any further interaction in this thread will be considered community only.


Have a Good day.


Thanks & Regards

Shivani


0 Kudos
Reply