Community
cancel
Showing results for 
Search instead for 
Did you mean: 
oleotiger
Novice
301 Views

Weird result when running an app on two host

Jump to solution

Environment:

I have 2 host:

A:  4 sockets with Intel(R) Xeon(R) Platinum 8365HC CPU @ 3.00GHz and SNC off.

B:  2 sockets with Intel(R) Xeon(R) Gold 6248R CPU @ 3.00GHz and SNC on.

CPUs on both hosts have the same flags. The OS version is the same:CentOS Linux release 7.6.1810. Intel oneapi 2021.2 is installed on both hosts.

Host A and B are mounted with a NFS. Applications are installed on the NFS.

Application:

The application I work with is an open-source HPC application lammps (version 30Nov29).

How I compiled:

cd src
make yes-MOLECULE yes-USER-REAXC yes-PERI yes-USER-EFF yes-MANYBODY yes-GRANULAR yes-KSPACE yes-RIGID yes-opt
make yes-user-omp yes-user-intel
make -j intel_cpu_intelmpi

How to run:

./lmp_intel_cpu_intelmpi -h

Problems:

I compiled lammps on both host A (named lammps_A )and host B (named lammps_B).

lammps_A and lammps_B are in the NFS so I can run them on each host.

Here is the problem:

When I run lammps_A and lammps_B on host A, I got 'Segmentation fault '.

When I run lammps_A and lammps_B on host B, the application works well and I can get the correct output.

This is such a weird result. I have no idea why there are different output.

Any ideas ? 

How to debug it or possible resons are both appreciated.

Thank you.

0 Kudos
1 Solution
ShivaniK_Intel
Moderator
186 Views

Hi,


>>>"The key problem is not mpi. I think there is something wrong with the application lammps".


As you said this issue is specific to application lammps, Could you please let us know if there is anything else that we can help you with? If no, could you please confirm whether we can close this thread?


Thanks & Regards

Shivani


View solution in original post

8 Replies
ShivaniK_Intel
Moderator
259 Views

Hi,


Thanks for reaching out to us.


We see that you are running an application on NFS, so there is no need to compile the application on both host A and host B. Compiling the application on a single host will be enough as NFS files are shared.


Also, check the below link for running the application using mpirun:

https://lammps.sandia.gov/doc/Run_basics.html


Please let us know if you face any issues.


Thanks & Regards

Shivani


oleotiger
Novice
248 Views

Sorry, but I think you didn't get my question.

The problem is that: the compiled binary apps act differently.

When I run lammps_A and lammps_B on host A, I got 'Segmentation fault '.

When I run lammps_A and lammps_B on host B, the application works well and I can get the correct output.

 

Both apps are compiled completely with no error. 

 

I want to find how to run lammps correctly on host A as neither binary compiled on hostA nor on host B  works.

 

jackyjoy123
Beginner
255 Views

thanks my issue has been fixed.

ShivaniK_Intel
Moderator
233 Views

Hi,


Can you please try with these commands on host A


mpirun -np 4 lmp_mpi -in in.file

mpirun -np 8 /path/to/lammps/src/lmp_mpi -in in.file

mpirun -np 6 /usr/local/bin/lmp -in in.file


For more information on the basics of running LAMMPS please refer to the below link.


https://lammps.sandia.gov/doc/Run_basics.html


If your issue still persists,can you please provide the complete log using the below command:


I_MPI_DEBUG=10 mpirun -np 6 /usr/local/bin/lmp -in in.file


Thanks & Regards

Shivani


oleotiger
Novice
227 Views
[0] MPI startup(): Intel(R) MPI Library, Version 2021.2  Build 20210302 (id: f4f7c92cd)
[0] MPI startup(): Copyright (C) 2003-2021 Intel Corporation.  All rights reserved.
[0] MPI startup(): library kind: release
[0] MPI startup(): shm segment size (796 MB per rank) * (6 local ranks) = 4777 MB total
[0] MPI startup(): libfabric version: 1.11.0-impi
[0] MPI startup(): libfabric provider: verbs;ofi_rxm

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 0 PID 3352 RUNNING AT node33
=   KILLED BY SIGNAL: 9 (Killed)
===================================================================================

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 1 PID 3353 RUNNING AT node33
=   KILLED BY SIGNAL: 9 (Killed)
===================================================================================

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 2 PID 3354 RUNNING AT node33
=   KILLED BY SIGNAL: 9 (Killed)
===================================================================================

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 3 PID 3355 RUNNING AT node33
=   KILLED BY SIGNAL: 11 (Segmentation fault)
===================================================================================

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 4 PID 3356 RUNNING AT node33
=   KILLED BY SIGNAL: 9 (Killed)
===================================================================================

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 5 PID 3357 RUNNING AT node33
=   KILLED BY SIGNAL: 9 (Killed)
===================================================================================

 

The key problem is not mpi. I think there is something wrong with the application lmp.  But it can run normally on host B.

I doubt whether different CPU type on host A and host B is the reason.

ShivaniK_Intel
Moderator
187 Views

Hi,


>>>"The key problem is not mpi. I think there is something wrong with the application lammps".


As you said this issue is specific to application lammps, Could you please let us know if there is anything else that we can help you with? If no, could you please confirm whether we can close this thread?


Thanks & Regards

Shivani


View solution in original post

oleotiger
Novice
168 Views

It's OK to close this thread.

ShivaniK_Intel
Moderator
146 Views

Hi,


Thanks for the confirmation!


We will no longer respond to this thread. If you require any additional assistance from Intel, please start a new thread. Any further interaction in this thread will be considered community only.


Have a Good day.


Thanks & Regards

Shivani


Reply