- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Environment:
I have 2 host:
A: 4 sockets with Intel(R) Xeon(R) Platinum 8365HC CPU @ 3.00GHz and SNC off.
B: 2 sockets with Intel(R) Xeon(R) Gold 6248R CPU @ 3.00GHz and SNC on.
CPUs on both hosts have the same flags. The OS version is the same:CentOS Linux release 7.6.1810. Intel oneapi 2021.2 is installed on both hosts.
Host A and B are mounted with a NFS. Applications are installed on the NFS.
Application:
The application I work with is an open-source HPC application lammps (version 30Nov29).
How I compiled:
cd src
make yes-MOLECULE yes-USER-REAXC yes-PERI yes-USER-EFF yes-MANYBODY yes-GRANULAR yes-KSPACE yes-RIGID yes-opt
make yes-user-omp yes-user-intel
make -j intel_cpu_intelmpi
How to run:
./lmp_intel_cpu_intelmpi -h
Problems:
I compiled lammps on both host A (named lammps_A )and host B (named lammps_B).
lammps_A and lammps_B are in the NFS so I can run them on each host.
Here is the problem:
When I run lammps_A and lammps_B on host A, I got 'Segmentation fault '.
When I run lammps_A and lammps_B on host B, the application works well and I can get the correct output.
This is such a weird result. I have no idea why there are different output.
Any ideas ?
How to debug it or possible resons are both appreciated.
Thank you.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
>>>"The key problem is not mpi. I think there is something wrong with the application lammps".
As you said this issue is specific to application lammps, Could you please let us know if there is anything else that we can help you with? If no, could you please confirm whether we can close this thread?
Thanks & Regards
Shivani
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thanks for reaching out to us.
We see that you are running an application on NFS, so there is no need to compile the application on both host A and host B. Compiling the application on a single host will be enough as NFS files are shared.
Also, check the below link for running the application using mpirun:
https://lammps.sandia.gov/doc/Run_basics.html
Please let us know if you face any issues.
Thanks & Regards
Shivani
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sorry, but I think you didn't get my question.
The problem is that: the compiled binary apps act differently.
When I run lammps_A and lammps_B on host A, I got 'Segmentation fault '.
When I run lammps_A and lammps_B on host B, the application works well and I can get the correct output.
Both apps are compiled completely with no error.
I want to find how to run lammps correctly on host A as neither binary compiled on hostA nor on host B works.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Can you please try with these commands on host A
mpirun -np 4 lmp_mpi -in in.file
mpirun -np 8 /path/to/lammps/src/lmp_mpi -in in.file
mpirun -np 6 /usr/local/bin/lmp -in in.file
For more information on the basics of running LAMMPS please refer to the below link.
https://lammps.sandia.gov/doc/Run_basics.html
If your issue still persists,can you please provide the complete log using the below command:
I_MPI_DEBUG=10 mpirun -np 6 /usr/local/bin/lmp -in in.file
Thanks & Regards
Shivani
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[0] MPI startup(): Intel(R) MPI Library, Version 2021.2 Build 20210302 (id: f4f7c92cd)
[0] MPI startup(): Copyright (C) 2003-2021 Intel Corporation. All rights reserved.
[0] MPI startup(): library kind: release
[0] MPI startup(): shm segment size (796 MB per rank) * (6 local ranks) = 4777 MB total
[0] MPI startup(): libfabric version: 1.11.0-impi
[0] MPI startup(): libfabric provider: verbs;ofi_rxm
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 0 PID 3352 RUNNING AT node33
= KILLED BY SIGNAL: 9 (Killed)
===================================================================================
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 1 PID 3353 RUNNING AT node33
= KILLED BY SIGNAL: 9 (Killed)
===================================================================================
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 2 PID 3354 RUNNING AT node33
= KILLED BY SIGNAL: 9 (Killed)
===================================================================================
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 3 PID 3355 RUNNING AT node33
= KILLED BY SIGNAL: 11 (Segmentation fault)
===================================================================================
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 4 PID 3356 RUNNING AT node33
= KILLED BY SIGNAL: 9 (Killed)
===================================================================================
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 5 PID 3357 RUNNING AT node33
= KILLED BY SIGNAL: 9 (Killed)
===================================================================================
The key problem is not mpi. I think there is something wrong with the application lmp. But it can run normally on host B.
I doubt whether different CPU type on host A and host B is the reason.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
>>>"The key problem is not mpi. I think there is something wrong with the application lammps".
As you said this issue is specific to application lammps, Could you please let us know if there is anything else that we can help you with? If no, could you please confirm whether we can close this thread?
Thanks & Regards
Shivani
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It's OK to close this thread.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thanks for the confirmation!
We will no longer respond to this thread. If you require any additional assistance from Intel, please start a new thread. Any further interaction in this thread will be considered community only.
Have a Good day.
Thanks & Regards
Shivani
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page