<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re:Job Core Dumps when using Intel MPI over two nodes in Intel® MPI Library</title>
    <link>https://community.intel.com/t5/Intel-MPI-Library/Job-Core-Dumps-when-using-Intel-MPI-over-two-nodes/m-p/1503988#M10751</link>
    <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Thank you for posting in intel community.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Could you please provide the following details :&lt;/P&gt;&lt;P&gt;1. OS and output with lscpu command&lt;/P&gt;&lt;P&gt;2. The sample reproducer along with steps to reproduce the issue at our end&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Could you please try to run on the latest Intel MPI version(2021.9) exporting I_MPI_FABRICS = ofi and let us know if you still face the issue?&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Thanks And Regards,&lt;/P&gt;&lt;P&gt;Aishwarya&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;BR /&gt;</description>
    <pubDate>Wed, 12 Jul 2023 09:31:10 GMT</pubDate>
    <dc:creator>AishwaryaCV_Intel</dc:creator>
    <dc:date>2023-07-12T09:31:10Z</dc:date>
    <item>
      <title>Job Core Dumps when using Intel MPI over two nodes</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/Job-Core-Dumps-when-using-Intel-MPI-over-two-nodes/m-p/1503779#M10749</link>
      <description>&lt;P&gt;Hello&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I am having issues when running a Slurm job using Intel OneAPI MPI 2019.9 over two nodes using sbatch. All nodes can run on one node successfully but when I utilize Intel MPI parallelization over two nodes the jobs core dumps. Slurm does not throw an error and the tasks are running on both nodes. I believe I am missing something, but I don't know what. I made sure I compiled the executables with OneAPI.&amp;nbsp; Script and Error Log below. Any suggestions would be appreciated. If you need more info please let me know.&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;&lt;P&gt;Chris&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;runscript -&amp;nbsp;&lt;/P&gt;&lt;P&gt;#!/bin/bash&lt;BR /&gt;#SBATCH --job-name=test&lt;BR /&gt;#SBATCH --nodes=2&lt;BR /&gt;#SBATCH --ntasks-per-node=64&lt;BR /&gt;#SBATCH --mem-per-cpu=2G&lt;BR /&gt;#SBATCH --error=error-%j.err&lt;BR /&gt;#SBATCH --partition=dragon&lt;BR /&gt;#SBATCH --time=1:00:00&lt;BR /&gt;#SBATCH --account=wexler&lt;BR /&gt;#SBATCH --propagate=STACK&lt;/P&gt;&lt;P&gt;# Set MPI environment variables&lt;BR /&gt;export I_MPI_FABRICS=sockets&lt;BR /&gt;export I_MPI_FALLBACK=0&lt;/P&gt;&lt;P&gt;srun /software/lammps/build/lmp -in in.lj&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Error Log -&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;[dragon1:140740:0:140740] Caught signal 11 (Segmentation fault: Sent by the kernel at address (nil))&lt;BR /&gt;==== backtrace (tid: 140740) ====&lt;BR /&gt;0 /lib/x86_64-linux-gnu/libucs.so.0(ucs_handle_error+0x2e4) [0x1530cd8e1fc4]&lt;BR /&gt;1 /lib/x86_64-linux-gnu/libucs.so.0(+0x24fec) [0x1530cd8e5fec]&lt;BR /&gt;2 /lib/x86_64-linux-gnu/libucs.so.0(+0x251aa) [0x1530cd8e61aa]&lt;BR /&gt;3 /lib/x86_64-linux-gnu/libc.so.6(+0x42520) [0x1532b74e9520]&lt;BR /&gt;4 /software/intel/oneapi/mpi/2021.9.0//libfabric/lib/prov/librxm-fi.so(+0x2e886) [0x1530cc22e886]&lt;BR /&gt;5 /software/intel/oneapi/mpi/2021.9.0//libfabric/lib/prov/librxm-fi.so(+0x2e8a9) [0x1530cc22e8a9]&lt;BR /&gt;6 /software/intel/oneapi/mpi/2021.9.0//libfabric/lib/prov/librxm-fi.so(+0xe9e5) [0x1530cc20e9e5]&lt;BR /&gt;7 /software/intel/oneapi/mpi/2021.9.0//libfabric/lib/prov/librxm-fi.so(+0x13c17) [0x1530cc213c17]&lt;BR /&gt;8 /software/intel/oneapi/mpi/2021.9.0//libfabric/lib/prov/librxm-fi.so(+0x14389) [0x1530cc214389]&lt;BR /&gt;9 /software/intel/oneapi/mpi/2021.9.0//libfabric/lib/prov/librxm-fi.so(+0x150b0) [0x1530cc2150b0]&lt;BR /&gt;10 /software/intel/oneapi/mpi/2021.9.0//libfabric/lib/prov/librxm-fi.so(+0x15a9a) [0x1530cc215a9a]&lt;BR /&gt;11 /software/intel/oneapi/mpi/2021.9.0//libfabric/lib/prov/librxm-fi.so(+0x16b23) [0x1530cc216b23]&lt;BR /&gt;12 /software/intel/oneapi/mpi/2021.9.0//libfabric/lib/prov/librxm-fi.so(+0x16ce9) [0x1530cc216ce9]&lt;BR /&gt;13 /software/intel/oneapi/mpi/2021.9.0//libfabric/lib/prov/librxm-fi.so(+0x31a6d) [0x1530cc231a6d]&lt;BR /&gt;14 /software/intel/oneapi/mpi/2021.9.0//libfabric/lib/prov/librxm-fi.so(+0x319f7) [0x1530cc2319f7]&lt;BR /&gt;15 /software/intel/oneapi/mpi/2021.9.0//lib/release/libmpi.so.12(+0x666b8e) [0x1532b8066b8e]&lt;BR /&gt;16 /software/intel/oneapi/mpi/2021.9.0//lib/release/libmpi.so.12(+0x22b919) [0x1532b7c2b919]&lt;BR /&gt;17 /software/intel/oneapi/mpi/2021.9.0//lib/release/libmpi.so.12(+0x7c658d) [0x1532b81c658d]&lt;BR /&gt;18 /software/intel/oneapi/mpi/2021.9.0//lib/release/libmpi.so.12(+0x3b71c0) [0x1532b7db71c0]&lt;BR /&gt;19 /software/intel/oneapi/mpi/2021.9.0//lib/release/libmpi.so.12(+0x722785) [0x1532b8122785]&lt;BR /&gt;20 /software/intel/oneapi/mpi/2021.9.0//lib/release/libmpi.so.12(+0x2a0153) [0x1532b7ca0153]&lt;BR /&gt;21 /software/intel/oneapi/mpi/2021.9.0//lib/release/libmpi.so.12(MPI_Scan+0x56e) [0x1532b7b8a40e]&lt;BR /&gt;22 /software/lammps/build/lmp(+0x203042) [0x55cc32ced042]&lt;BR /&gt;23 /software/lammps/build/lmp(+0x2c0313) [0x55cc32daa313]&lt;BR /&gt;24 /software/lammps/build/lmp(+0xcf204) [0x55cc32bb9204]&lt;BR /&gt;25 /software/lammps/build/lmp(+0xcf616) [0x55cc32bb9616]&lt;BR /&gt;26 /software/lammps/build/lmp(+0xaddbc) [0x55cc32b97dbc]&lt;BR /&gt;27 /lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x1532b74d0d90]&lt;BR /&gt;28 /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80) [0x1532b74d0e40]&lt;BR /&gt;29 /software/lammps/build/lmp(+0xaee25) [0x55cc32b98e25]&lt;BR /&gt;=================================&lt;BR /&gt;==== backtrace (tid: 140741) ====&lt;BR /&gt;0 /lib/x86_64-linux-gnu/libucs.so.0(ucs_handle_error+0x2e4) [0x14b9bfca1fc4]&lt;BR /&gt;1 /lib/x86_64-linux-gnu/libucs.so.0(+0x24fec) [0x14b9bfca5fec]&lt;BR /&gt;2 /lib/x86_64-linux-gnu/libucs.so.0(+0x251aa) [0x14b9bfca61aa]&lt;BR /&gt;3 /lib/x86_64-linux-gnu/libc.so.6(+0x42520) [0x14bba9509520]&lt;BR /&gt;4 /software/intel/oneapi/mpi/2021.9.0//libfabric/lib/prov/librxm-fi.so(+0x2e886) [0x14b9be22e886]&lt;BR /&gt;5 /software/intel/oneapi/mpi/2021.9.0//libfabric/lib/prov/librxm-fi.so(+0x2e8a9) [0x14b9be22e8a9]&lt;BR /&gt;6 /software/intel/oneapi/mpi/2021.9.0//libfabric/lib/prov/librxm-fi.so(+0xe9e5) [0x14b9be20e9e5]&lt;BR /&gt;7 /software/intel/oneapi/mpi/2021.9.0//libfabric/lib/prov/librxm-fi.so(+0x13c17) [0x14b9be213c17]&lt;BR /&gt;8 /software/intel/oneapi/mpi/2021.9.0//libfabric/lib/prov/librxm-fi.so(+0x14389) [0x14b9be214389]&lt;BR /&gt;9 /software/intel/oneapi/mpi/2021.9.0//libfabric/lib/prov/librxm-fi.so(+0x150b0) [0x14b9be2150b0]&lt;BR /&gt;10 /software/intel/oneapi/mpi/2021.9.0//libfabric/lib/prov/librxm-fi.so(+0x15a9a) [0x14b9be215a9a]&lt;BR /&gt;11 /software/intel/oneapi/mpi/2021.9.0//libfabric/lib/prov/librxm-fi.so(+0x16b23) [0x14b9be216b23]&lt;BR /&gt;12 /software/intel/oneapi/mpi/2021.9.0//libfabric/lib/prov/librxm-fi.so(+0x16ce9) [0x14b9be216ce9]&lt;BR /&gt;13 /software/intel/oneapi/mpi/2021.9.0//libfabric/lib/prov/librxm-fi.so(+0x31a6d) [0x14b9be231a6d]&lt;BR /&gt;14 /software/intel/oneapi/mpi/2021.9.0//libfabric/lib/prov/librxm-fi.so(+0x319f7) [0x14b9be2319f7]&lt;BR /&gt;15 /software/intel/oneapi/mpi/2021.9.0//lib/release/libmpi.so.12(+0x666b8e) [0x14bbaa066b8e]&lt;BR /&gt;16 /software/intel/oneapi/mpi/2021.9.0//lib/release/libmpi.so.12(+0x22b919) [0x14bba9c2b919]&lt;BR /&gt;17 /software/intel/oneapi/mpi/2021.9.0//lib/release/libmpi.so.12(+0x7c658d) [0x14bbaa1c658d]&lt;BR /&gt;18 /software/intel/oneapi/mpi/2021.9.0//lib/release/libmpi.so.12(+0x3b71c0) [0x14bba9db71c0]&lt;BR /&gt;19 /software/intel/oneapi/mpi/2021.9.0//lib/release/libmpi.so.12(+0x722785) [0x14bbaa122785]&lt;BR /&gt;20 /software/intel/oneapi/mpi/2021.9.0//lib/release/libmpi.so.12(+0x2a0153) [0x14bba9ca0153]&lt;BR /&gt;21 /software/intel/oneapi/mpi/2021.9.0//lib/release/libmpi.so.12(MPI_Scan+0x56e) [0x14bba9b8a40e]&lt;BR /&gt;22 /software/lammps/build/lmp(+0x203042) [0x55d5788d4042]&lt;BR /&gt;23 /software/lammps/build/lmp(+0x2c0313) [0x55d578991313]&lt;BR /&gt;24 /software/lammps/build/lmp(+0xcf204) [0x55d5787a0204]&lt;BR /&gt;25 /software/lammps/build/lmp(+0xcf616) [0x55d5787a0616]&lt;BR /&gt;26 /software/lammps/build/lmp(+0xaddbc) [0x55d57877edbc]&lt;BR /&gt;27 /lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x14bba94f0d90]&lt;BR /&gt;28 /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80) [0x14bba94f0e40]&lt;BR /&gt;29 /software/lammps/build/lmp(+0xaee25) [0x55d57877fe25]&lt;BR /&gt;=================================&lt;BR /&gt;[dragon1:140743:0:140743] Caught signal 11 (Segmentation fault: Sent by the kernel at address (nil))&lt;BR /&gt;==== backtrace (tid: 140743) ====&lt;BR /&gt;0 /lib/x86_64-linux-gnu/libucs.so.0(ucs_handle_error+0x2e4) [0x148f6eed5fc4]&lt;BR /&gt;1 /lib/x86_64-linux-gnu/libucs.so.0(+0x24fec) [0x148f6eed9fec]&lt;BR /&gt;2 /lib/x86_64-linux-gnu/libucs.so.0(+0x251aa) [0x148f6eeda1aa]&lt;BR /&gt;3 /lib/x86_64-linux-gnu/libc.so.6(+0x42520) [0x1491589f0520]&lt;BR /&gt;4 /software/intel/oneapi/mpi/2021.9.0//libfabric/lib/prov/librxm-fi.so(+0x2e886) [0x148f6d82e886]&lt;BR /&gt;5 /software/intel/oneapi/mpi/2021.9.0//libfabric/lib/prov/librxm-fi.so(+0x2e8a9) [0x148f6d82e8a9]&lt;BR /&gt;6 /software/intel/oneapi/mpi/2021.9.0//libfabric/lib/prov/librxm-fi.so(+0xe9e5) [0x148f6d80e9e5]&lt;BR /&gt;7 /software/intel/oneapi/mpi/2021.9.0//libfabric/lib/prov/librxm-fi.so(+0x13c17) [0x148f6d813c17]&lt;BR /&gt;8 /software/intel/oneapi/mpi/2021.9.0//libfabric/lib/prov/librxm-fi.so(+0x14389) [0x148f6d814389]&lt;BR /&gt;9 /software/intel/oneapi/mpi/2021.9.0//libfabric/lib/prov/librxm-fi.so(+0x150b0) [0x148f6d8150b0]&lt;BR /&gt;10 /software/intel/oneapi/mpi/2021.9.0//libfabric/lib/prov/librxm-fi.so(+0x15a9a) [0x148f6d815a9a]&lt;BR /&gt;11 /software/intel/oneapi/mpi/2021.9.0//libfabric/lib/prov/librxm-fi.so(+0x16b23) [0x148f6d816b23]&lt;BR /&gt;12 /software/intel/oneapi/mpi/2021.9.0//libfabric/lib/prov/librxm-fi.so(+0x16ce9) [0x148f6d816ce9]&lt;BR /&gt;13 /software/intel/oneapi/mpi/2021.9.0//libfabric/lib/prov/librxm-fi.so(+0x31a6d) [0x148f6d831a6d]&lt;BR /&gt;14 /software/intel/oneapi/mpi/2021.9.0//libfabric/lib/prov/librxm-fi.so(+0x319f7) [0x148f6d8319f7]&lt;BR /&gt;15 /software/intel/oneapi/mpi/2021.9.0//lib/release/libmpi.so.12(+0x666b8e) [0x149159466b8e]&lt;BR /&gt;16 /software/intel/oneapi/mpi/2021.9.0//lib/release/libmpi.so.12(+0x22b919) [0x14915902b919]&lt;BR /&gt;17 /software/intel/oneapi/mpi/2021.9.0//lib/release/libmpi.so.12(+0x7c658d) [0x1491595c658d]&lt;BR /&gt;18 /software/intel/oneapi/mpi/2021.9.0//lib/release/libmpi.so.12(+0x3b71c0) [0x1491591b71c0]&lt;BR /&gt;19 /software/intel/oneapi/mpi/2021.9.0//lib/release/libmpi.so.12(+0x722785) [0x149159522785]&lt;BR /&gt;20 /software/intel/oneapi/mpi/2021.9.0//lib/release/libmpi.so.12(+0x2a0153) [0x1491590a0153]&lt;BR /&gt;21 /software/intel/oneapi/mpi/2021.9.0//lib/release/libmpi.so.12(MPI_Scan+0x56e) [0x149158f8a40e]&lt;BR /&gt;22 /software/lammps/build/lmp(+0x203042) [0x5565946e0042]&lt;BR /&gt;23 /software/lammps/build/lmp(+0x2c0313) [0x55659479d313]&lt;BR /&gt;24 /software/lammps/build/lmp(+0xcf204) [0x5565945ac204]&lt;BR /&gt;25 /software/lammps/build/lmp(+0xcf616) [0x5565945ac616]&lt;BR /&gt;26 /software/lammps/build/lmp(+0xaddbc) [0x55659458adbc]&lt;BR /&gt;27 /lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x1491589d7d90]&lt;BR /&gt;28 /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80) [0x1491589d7e40]&lt;BR /&gt;29 /software/lammps/build/lmp(+0xaee25) [0x55659458be25]&lt;BR /&gt;=================================&lt;BR /&gt;==== backtrace (tid: 140692) ====&lt;BR /&gt;0 /lib/x86_64-linux-gnu/libucs.so.0(ucs_handle_error+0x2e4) [0x148c42ed5fc4]&lt;BR /&gt;1 /lib/x86_64-linux-gnu/libucs.so.0(+0x24fec) [0x148c42ed9fec]&lt;BR /&gt;2 /lib/x86_64-linux-gnu/libucs.so.0(+0x251aa) [0x148c42eda1aa]&lt;BR /&gt;3 /lib/x86_64-linux-gnu/libc.so.6(+0x42520) [0x148e2c9f0520]&lt;BR /&gt;4 /software/intel/oneapi/mpi/2021.9.0//libfabric/lib/prov/librxm-fi.so(+0x2e886) [0x148c4182e886]&lt;BR /&gt;5 /software/intel/oneapi/mpi/2021.9.0//libfabric/lib/prov/librxm-fi.so(+0x2e8a9) [0x148c4182e8a9]&lt;BR /&gt;6 /software/intel/oneapi/mpi/2021.9.0//libfabric/lib/prov/librxm-fi.so(+0xe9e5) [0x148c4180e9e5]&lt;BR /&gt;7 /software/intel/oneapi/mpi/2021.9.0//libfabric/lib/prov/librxm-fi.so(+0x13c17) [0x148c41813c17]&lt;BR /&gt;8 /software/intel/oneapi/mpi/2021.9.0//libfabric/lib/prov/librxm-fi.so(+0x14389) [0x148c41814389]&lt;BR /&gt;9 /software/intel/oneapi/mpi/2021.9.0//libfabric/lib/prov/librxm-fi.so(+0x150b0) [0x148c418150b0]&lt;BR /&gt;10 /software/intel/oneapi/mpi/2021.9.0//libfabric/lib/prov/librxm-fi.so(+0x15a9a) [0x148c41815a9a]&lt;BR /&gt;11 /software/intel/oneapi/mpi/2021.9.0//libfabric/lib/prov/librxm-fi.so(+0x16b23) [0x148c41816b23]&lt;BR /&gt;12 /software/intel/oneapi/mpi/2021.9.0//libfabric/lib/prov/librxm-fi.so(+0x16ce9) [0x148c41816ce9]&lt;BR /&gt;13 /software/intel/oneapi/mpi/2021.9.0//libfabric/lib/prov/librxm-fi.so(+0x31a6d) [0x148c41831a6d]&lt;BR /&gt;14 /software/intel/oneapi/mpi/2021.9.0//libfabric/lib/prov/librxm-fi.so(+0x319f7) [0x148c418319f7]&lt;BR /&gt;15 /software/intel/oneapi/mpi/2021.9.0//lib/release/libmpi.so.12(+0x666b8e) [0x148e2d466b8e]&lt;BR /&gt;16 /software/intel/oneapi/mpi/2021.9.0//lib/release/libmpi.so.12(+0x22b919) [0x148e2d02b919]&lt;BR /&gt;17 /software/intel/oneapi/mpi/2021.9.0//lib/release/libmpi.so.12(+0x7c658d) [0x148e2d5c658d]&lt;BR /&gt;18 /software/intel/oneapi/mpi/2021.9.0//lib/release/libmpi.so.12(+0x3b71c0) [0x148e2d1b71c0]&lt;BR /&gt;19 /software/intel/oneapi/mpi/2021.9.0//lib/release/libmpi.so.12(+0x722785) [0x148e2d522785]&lt;BR /&gt;20 /software/intel/oneapi/mpi/2021.9.0//lib/release/libmpi.so.12(+0x2a0153) [0x148e2d0a0153]&lt;BR /&gt;21 /software/intel/oneapi/mpi/2021.9.0//lib/release/libmpi.so.12(MPI_Scan+0x56e) [0x148e2cf8a40e]&lt;BR /&gt;22 /software/lammps/build/lmp(+0x203042) [0x556c1aa11042]&lt;BR /&gt;23 /software/lammps/build/lmp(+0x2c0313) [0x556c1aace313]&lt;BR /&gt;24 /software/lammps/build/lmp(+0xcf204) [0x556c1a8dd204]&lt;BR /&gt;25 /software/lammps/build/lmp(+0xcf616) [0x556c1a8dd616]&lt;BR /&gt;26 /software/lammps/build/lmp(+0xaddbc) [0x556c1a8bbdbc]&lt;BR /&gt;27 /lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x148e2c9d7d90]&lt;BR /&gt;28 /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80) [0x148e2c9d7e40]&lt;BR /&gt;29 /software/lammps/build/lmp(+0xaee25) [0x556c1a8bce25]&lt;BR /&gt;=================================&lt;BR /&gt;srun: error: dragon1: tasks 4-6,8,11,16,20-21,23,25,36-38,40,43,45,51-53,56: Segmentation fault&lt;BR /&gt;srun: error: dragon1: task 60: Segmentation fault (core dumped)&lt;BR /&gt;srun: error: dragon1: task 46: Segmentation fault (core dumped)&lt;BR /&gt;srun: error: dragon1: tasks 14,22,28,30,44,54,62: Segmentation fault (core dumped)&lt;BR /&gt;srun: error: dragon1: task 12: Segmentation fault (core dumped)&lt;BR /&gt;srun: Job step aborted: Waiting up to 32 seconds for job step to finish.&lt;BR /&gt;srun: got SIGCONT&lt;BR /&gt;slurmstepd-dragon1: error: *** JOB 423 ON dragon1 CANCELLED AT 2023-07-11T13:48:01 ***&lt;BR /&gt;slurmstepd-dragon1: error: *** STEP 423.0 ON dragon1 CANCELLED AT 2023-07-11T13:48:01 ***&lt;BR /&gt;srun: forcing job termination&lt;BR /&gt;srun: error: dragon1: tasks 1-3,7,9-10,13,15,17-19,24,26-27,29,31-35,39,41-42,47-50,55,57-59,61,63: Terminated&lt;BR /&gt;srun: error: dragon2: tasks 64-127: Terminated&lt;BR /&gt;srun: error: dragon1: task 0: Terminated&lt;BR /&gt;root@bear:/data1/wexler/lammps-test#&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 11 Jul 2023 19:57:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/Job-Core-Dumps-when-using-Intel-MPI-over-two-nodes/m-p/1503779#M10749</guid>
      <dc:creator>chris-wustl</dc:creator>
      <dc:date>2023-07-11T19:57:00Z</dc:date>
    </item>
    <item>
      <title>Re:Job Core Dumps when using Intel MPI over two nodes</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/Job-Core-Dumps-when-using-Intel-MPI-over-two-nodes/m-p/1503988#M10751</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Thank you for posting in intel community.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Could you please provide the following details :&lt;/P&gt;&lt;P&gt;1. OS and output with lscpu command&lt;/P&gt;&lt;P&gt;2. The sample reproducer along with steps to reproduce the issue at our end&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Could you please try to run on the latest Intel MPI version(2021.9) exporting I_MPI_FABRICS = ofi and let us know if you still face the issue?&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Thanks And Regards,&lt;/P&gt;&lt;P&gt;Aishwarya&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;BR /&gt;</description>
      <pubDate>Wed, 12 Jul 2023 09:31:10 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/Job-Core-Dumps-when-using-Intel-MPI-over-two-nodes/m-p/1503988#M10751</guid>
      <dc:creator>AishwaryaCV_Intel</dc:creator>
      <dc:date>2023-07-12T09:31:10Z</dc:date>
    </item>
    <item>
      <title>Re: Re:Job Core Dumps when using Intel MPI over two nodes</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/Job-Core-Dumps-when-using-Intel-MPI-over-two-nodes/m-p/1504075#M10755</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;SPAN&gt;Aishwarya&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Really appreciate the help. Let me know if you need anything else.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;1. I changed the I_MPI-FABRICS to ofi - received the same error output.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;2. To reproduce simply goto:&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;A href="https://www.lammps.org/download.html" target="_blank"&gt;https://www.lammps.org/download.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Download. I compiled the code with the latest Intel OneAPi compiler. I will attach the input file to run lmp.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;3. I installed Intel OneAPI basekit &amp;amp; HPCkit 2023.1 the latest version on the Intel website. Is there an updated version? I believe I am running the lastest version of MPI.&lt;/P&gt;&lt;P&gt;wexler@bear:/data1/wexler/lammps-test$ mpirun --version&lt;BR /&gt;Intel(R) MPI Library for Linux* OS, Version 2021.9 Build 20230307 (id: d82b3071db)&lt;BR /&gt;Copyright 2003-2023, Intel Corporation.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;4. My OS is Ubuntu 22.04 with latest patch set.&amp;nbsp; Same on all nodes. Slurm version is the same on all nodes.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;5. lscpu output -&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Architecture: x86_64&lt;BR /&gt;CPU op-mode(s): 32-bit, 64-bit&lt;BR /&gt;Address sizes: 46 bits physical, 57 bits virtual&lt;BR /&gt;Byte Order: Little Endian&lt;BR /&gt;CPU(s): 128&lt;BR /&gt;On-line CPU(s) list: 0-127&lt;BR /&gt;Vendor ID: GenuineIntel&lt;BR /&gt;Model name: Intel(R) Xeon(R) Gold 6338 CPU @ 2.00GHz&lt;BR /&gt;CPU family: 6&lt;BR /&gt;Model: 106&lt;BR /&gt;Thread(s) per core: 2&lt;BR /&gt;Core(s) per socket: 32&lt;BR /&gt;Socket(s): 2&lt;BR /&gt;Stepping: 6&lt;BR /&gt;CPU max MHz: 3200.0000&lt;BR /&gt;CPU min MHz: 800.0000&lt;BR /&gt;BogoMIPS: 4000.00&lt;BR /&gt;Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mc&lt;BR /&gt;a cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss&lt;BR /&gt;ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art&lt;BR /&gt;arch_perfmon pebs bts rep_good nopl xtopology nonstop_&lt;BR /&gt;tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cp&lt;BR /&gt;l vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dc&lt;BR /&gt;a sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer&lt;BR /&gt;aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpu&lt;BR /&gt;id_fault epb cat_l3 invpcid_single intel_ppin ssbd mba&lt;BR /&gt;ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexprior&lt;BR /&gt;ity ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep&lt;BR /&gt;bmi2 erms invpcid cqm rdt_a avx512f avx512dq rdseed adx&lt;BR /&gt;smap avx512ifma clflushopt clwb intel_pt avx512cd sha_&lt;BR /&gt;ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm&lt;BR /&gt;_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local split_lo&lt;BR /&gt;ck_detect wbnoinvd dtherm ida arat pln pts avx512vbmi u&lt;BR /&gt;mip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_&lt;BR /&gt;vnni avx512_bitalg tme avx512_vpopcntdq la57 rdpid fsrm&lt;BR /&gt;md_clear pconfig flush_l1d arch_capabilities&lt;BR /&gt;Virtualization features:&lt;BR /&gt;Virtualization: VT-x&lt;BR /&gt;Caches (sum of all):&lt;BR /&gt;L1d: 3 MiB (64 instances)&lt;BR /&gt;L1i: 2 MiB (64 instances)&lt;BR /&gt;L2: 80 MiB (64 instances)&lt;BR /&gt;L3: 96 MiB (2 instances)&lt;BR /&gt;NUMA:&lt;BR /&gt;NUMA node(s): 2&lt;BR /&gt;NUMA node0 CPU(s): 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,&lt;BR /&gt;40,42,44,46,48,50,52,54,56,58,60,62,64,66,68,70,72,74,7&lt;BR /&gt;6,78,80,82,84,86,88,90,92,94,96,98,100,102,104,106,108,&lt;BR /&gt;110,112,114,116,118,120,122,124,126&lt;BR /&gt;NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,&lt;BR /&gt;41,43,45,47,49,51,53,55,57,59,61,63,65,67,69,71,73,75,7&lt;BR /&gt;7,79,81,83,85,87,89,91,93,95,97,99,101,103,105,107,109,&lt;BR /&gt;111,113,115,117,119,121,123,125,127&lt;BR /&gt;Vulnerabilities:&lt;BR /&gt;Itlb multihit: Not affected&lt;BR /&gt;L1tf: Not affected&lt;BR /&gt;Mds: Not affected&lt;BR /&gt;Meltdown: Not affected&lt;BR /&gt;Mmio stale data: Mitigation; Clear CPU buffers; SMT vulnerable&lt;BR /&gt;Retbleed: Not affected&lt;BR /&gt;Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl&lt;BR /&gt;and seccomp&lt;BR /&gt;Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer&lt;BR /&gt;sanitization&lt;BR /&gt;Spectre v2: Mitigation; Enhanced IBRS, IBPB conditional, RSB fillin&lt;BR /&gt;g, PBRSB-eIBRS SW sequence&lt;BR /&gt;Srbds: Not affected&lt;BR /&gt;Tsx async abort: Not affected&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 12 Jul 2023 14:58:42 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/Job-Core-Dumps-when-using-Intel-MPI-over-two-nodes/m-p/1504075#M10755</guid>
      <dc:creator>chris-wustl</dc:creator>
      <dc:date>2023-07-12T14:58:42Z</dc:date>
    </item>
    <item>
      <title>Re: Re:Job Core Dumps when using Intel MPI over two nodes</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/Job-Core-Dumps-when-using-Intel-MPI-over-two-nodes/m-p/1504201#M10756</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;SPAN&gt;Aishwarya&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;I was going to send you the source for lammps and the executable but it exceeds 71MG. Is there another mechanism to get this tar of the source to you?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Chris&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="chriswustl_0-1689191069981.png" style="width: 400px;"&gt;&lt;img src="https://community.intel.com/t5/image/serverpage/image-id/43512i003B27EF4FF73E54/image-size/medium/is-moderation-mode/true?v=v2&amp;amp;px=400&amp;amp;whitelist-exif-data=Orientation%2CResolution%2COriginalDefaultFinalSize%2CCopyright" role="button" title="chriswustl_0-1689191069981.png" alt="chriswustl_0-1689191069981.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 12 Jul 2023 19:44:48 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/Job-Core-Dumps-when-using-Intel-MPI-over-two-nodes/m-p/1504201#M10756</guid>
      <dc:creator>chris-wustl</dc:creator>
      <dc:date>2023-07-12T19:44:48Z</dc:date>
    </item>
    <item>
      <title>Re: Job Core Dumps when using Intel MPI over two nodes</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/Job-Core-Dumps-when-using-Intel-MPI-over-two-nodes/m-p/1506206#M10774</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;We were successfully able to build and run the Lammps on two nodes downloaded from &lt;A href="https://www.lammps.org/download.html" target="_blank" rel="noopener"&gt;https://www.lammps.org/download.html&lt;/A&gt; .&lt;/P&gt;
&lt;P&gt;Can find the attached script file(run1.zip) and the output log file(slurm-515403.zip).&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;we have run the script file with the following command line:&lt;/P&gt;
&lt;LI-CODE lang="markup"&gt;sbatch --partition workq run1.sh&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Could you please let us know on which ofi provider you are running?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;gt;&amp;gt;&amp;gt;&amp;gt;I was going to send you the source for lammps and the executable but it exceeds 71MG. Is there another mechanism to get this tar of the source to you?&lt;/P&gt;
&lt;P&gt;Is this source file the same as the one provided in the link, or is it a different one?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thanks And Regards,&lt;/P&gt;
&lt;P&gt;Aishwarya&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 19 Jul 2023 11:41:33 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/Job-Core-Dumps-when-using-Intel-MPI-over-two-nodes/m-p/1506206#M10774</guid>
      <dc:creator>AishwaryaCV_Intel</dc:creator>
      <dc:date>2023-07-19T11:41:33Z</dc:date>
    </item>
    <item>
      <title>Re: Job Core Dumps when using Intel MPI over two nodes</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/Job-Core-Dumps-when-using-Intel-MPI-over-two-nodes/m-p/1506247#M10775</link>
      <description>&lt;P&gt;Hi Aishwarya&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;I ran the script as above and received the same segmentation fault. I am running the same version of lammps you compiled. My slurm.conf file has the defaultmpi set to pmi2. I thought OneApi has ofi built into it when installing HPC Toolkit. I tried specifying different fabrics with the same result. Is there something I am missing? What else can I do to troubleshoot this issue?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Chris&lt;/P&gt;</description>
      <pubDate>Wed, 19 Jul 2023 15:11:15 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/Job-Core-Dumps-when-using-Intel-MPI-over-two-nodes/m-p/1506247#M10775</guid>
      <dc:creator>chris-wustl</dc:creator>
      <dc:date>2023-07-19T15:11:15Z</dc:date>
    </item>
    <item>
      <title>Re: Job Core Dumps when using Intel MPI over two nodes</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/Job-Core-Dumps-when-using-Intel-MPI-over-two-nodes/m-p/1506353#M10776</link>
      <description>&lt;P&gt;I tried multiple ways to get this to work and I think I found an issue (not sure). I turned on the I_MPI_DEBUG and set it to 5.&amp;nbsp; One note: We are using OneApi from a shared drive on all nodes. The head node with one of the worker nodes does work and produces this:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;[0] MPI startup(): Intel(R) MPI Library, Version 2021.9 Build 20230307 (id: d82b3071db)&lt;BR /&gt;[0] MPI startup(): Copyright (C) 2003-2023 Intel Corporation. All rights reserved.&lt;BR /&gt;[0] MPI startup(): library kind: release&lt;BR /&gt;[0] MPI startup(): libfabric version: 1.13.2rc1-impi&lt;BR /&gt;[0] MPI startup(): libfabric provider: tcp;ofi_rxm&lt;BR /&gt;[0] MPI startup(): File "/software/intel/oneapi/mpi/2021.9.0/etc/tuning_skx_shm-ofi_tcp-ofi-rxm_10.dat" not found&lt;BR /&gt;[0] MPI startup(): Load tuning file: "/software/intel/oneapi/mpi/2021.9.0/etc/tuning_skx_shm-ofi_tcp-ofi-rxm.dat"&lt;/P&gt;&lt;P&gt;lammps works&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The output for two worker nodes is:&lt;/P&gt;&lt;P&gt;[0] MPI startup(): Intel(R) MPI Library, Version 2021.9 Build 20230307 (id: d82b3071db)&lt;BR /&gt;[0] MPI startup(): Copyright (C) 2003-2023 Intel Corporation. All rights reserved.&lt;BR /&gt;[0] MPI startup(): library kind: release&lt;BR /&gt;[0] MPI startup(): libfabric version: 1.13.2rc1-impi&lt;BR /&gt;[0] MPI startup(): libfabric provider: tcp;ofi_rxm&lt;/P&gt;&lt;P&gt;Segmentaion Fault - No other output&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have tried different Fabrics put it produces the same errors.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Here's the sbatch script:&lt;/P&gt;&lt;P&gt;#!/bin/bash&lt;BR /&gt;#SBATCH --job-name=test&lt;BR /&gt;#SBATCH --nodes=2&lt;BR /&gt;#SBATCH --ntasks-per-node=32&lt;BR /&gt;#SBATCH --error=error-%j.err&lt;BR /&gt;#SBATCH --partition=general&lt;BR /&gt;#SBATCH --time=1:00:00&lt;BR /&gt;#SBATCH --account=wexler&lt;BR /&gt;#SBATCH --propagate=STACK&lt;/P&gt;&lt;P&gt;# Set MPI environment variables&lt;BR /&gt;export I_MPI_FABRICS=shm&lt;BR /&gt;export I_MPI_FALLBACK=0&lt;BR /&gt;export I_MPI_DEBUG=5&lt;/P&gt;&lt;P&gt;srun /software/lammps-chris/build/lmp -in in.lj&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Any ideas?&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;&lt;P&gt;Chris&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 19 Jul 2023 19:53:20 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/Job-Core-Dumps-when-using-Intel-MPI-over-two-nodes/m-p/1506353#M10776</guid>
      <dc:creator>chris-wustl</dc:creator>
      <dc:date>2023-07-19T19:53:20Z</dc:date>
    </item>
    <item>
      <title>Re: Job Core Dumps when using Intel MPI over two nodes</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/Job-Core-Dumps-when-using-Intel-MPI-over-two-nodes/m-p/1506536#M10780</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Could you please try to run following IMB benchmark command on your 2 nodes? And let us know the output of it?&lt;/P&gt;
&lt;LI-CODE lang="markup"&gt;mpirun -n 2 IMB-MPI1&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Please refer the following link for IMB benchmark &lt;A href="https://www.intel.com/content/www/us/en/docs/mpi-library/user-guide-benchmarks/2021-2/running-intel-r-mpi-benchmarks.html" target="_blank" rel="noopener"&gt;https://www.intel.com/content/www/us/en/docs/mpi-library/user-guide-benchmarks/2021-2/running-intel-r-mpi-benchmarks.html&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thanks And Regards,&lt;/P&gt;
&lt;P&gt;Aishwarya&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 20 Jul 2023 12:04:45 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/Job-Core-Dumps-when-using-Intel-MPI-over-two-nodes/m-p/1506536#M10780</guid>
      <dc:creator>AishwaryaCV_Intel</dc:creator>
      <dc:date>2023-07-20T12:04:45Z</dc:date>
    </item>
    <item>
      <title>Re: Job Core Dumps when using Intel MPI over two nodes</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/Job-Core-Dumps-when-using-Intel-MPI-over-two-nodes/m-p/1506563#M10782</link>
      <description>&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Output attached.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 20 Jul 2023 13:48:20 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/Job-Core-Dumps-when-using-Intel-MPI-over-two-nodes/m-p/1506563#M10782</guid>
      <dc:creator>chris-wustl</dc:creator>
      <dc:date>2023-07-20T13:48:20Z</dc:date>
    </item>
    <item>
      <title>Re: Job Core Dumps when using Intel MPI over two nodes</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/Job-Core-Dumps-when-using-Intel-MPI-over-two-nodes/m-p/1509027#M10821</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Could you please try to run IMB benchmark along with the flags and I_MPI_DEBUG used in slurm script file as shown below and provide us the full output?&lt;/P&gt;
&lt;LI-CODE lang="markup"&gt;#!/bin/bash

#SBATCH --job-name=test
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=32
#SBATCH --partition=general
#SBATCH --time=1:00:00
#SBATCH --account=wexler
#SBATCH --propagate=STACK

#Set MPI environment variables
export I_MPI_FABRICS=shm:ofi
export I_MPI_DEBUG=120

srun IMB-MPI1&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thanks And Regards,&lt;/P&gt;
&lt;P&gt;Aishwarya&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 28 Jul 2023 05:58:38 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/Job-Core-Dumps-when-using-Intel-MPI-over-two-nodes/m-p/1509027#M10821</guid>
      <dc:creator>AishwaryaCV_Intel</dc:creator>
      <dc:date>2023-07-28T05:58:38Z</dc:date>
    </item>
    <item>
      <title>Re: Job Core Dumps when using Intel MPI over two nodes</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/Job-Core-Dumps-when-using-Intel-MPI-over-two-nodes/m-p/1509128#M10824</link>
      <description>&lt;P&gt;Output attached.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Chris&lt;/P&gt;</description>
      <pubDate>Fri, 28 Jul 2023 14:26:22 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/Job-Core-Dumps-when-using-Intel-MPI-over-two-nodes/m-p/1509128#M10824</guid>
      <dc:creator>chris-wustl</dc:creator>
      <dc:date>2023-07-28T14:26:22Z</dc:date>
    </item>
    <item>
      <title>Re: Job Core Dumps when using Intel MPI over two nodes</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/Job-Core-Dumps-when-using-Intel-MPI-over-two-nodes/m-p/1509612#M10827</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Could you please try to run below slurm script and provide us the full output?&lt;/P&gt;
&lt;LI-CODE lang="markup"&gt;#!/bin/bash
#SBATCH --job-name=test
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=32
#SBATCH --partition=general
#SBATCH --time=1:00:00
#SBATCH --account=wexler
#SBATCH --propagate=STACK

#Set MPI environment variables
export I_MPI_FABRICS=shm:ofi
export I_MPI_DEBUG=120

mpirun -n 64 -ppn 32 IMB-MPI1 pingpong&lt;/LI-CODE&gt;
&lt;P&gt;Thanks And Regards,&lt;/P&gt;
&lt;P&gt;Aishwarya&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 31 Jul 2023 10:22:16 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/Job-Core-Dumps-when-using-Intel-MPI-over-two-nodes/m-p/1509612#M10827</guid>
      <dc:creator>AishwaryaCV_Intel</dc:creator>
      <dc:date>2023-07-31T10:22:16Z</dc:date>
    </item>
    <item>
      <title>Re: Job Core Dumps when using Intel MPI over two nodes</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/Job-Core-Dumps-when-using-Intel-MPI-over-two-nodes/m-p/1509862#M10828</link>
      <description>&lt;P&gt;Output Attached&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Chris&lt;/P&gt;</description>
      <pubDate>Mon, 31 Jul 2023 23:20:58 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/Job-Core-Dumps-when-using-Intel-MPI-over-two-nodes/m-p/1509862#M10828</guid>
      <dc:creator>chris-wustl</dc:creator>
      <dc:date>2023-07-31T23:20:58Z</dc:date>
    </item>
    <item>
      <title>Re: Job Core Dumps when using Intel MPI over two nodes</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/Job-Core-Dumps-when-using-Intel-MPI-over-two-nodes/m-p/1512344#M10846</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Could you please let us know if you have set up passwordless SSH on your machine and confirm that it is functioning correctly? Could you also please provide information about the interconnect and the current drivers you are using?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;It seems that you are using hyperthreading. I would like to request you to disable it for your tasks. You can achieve this, for example as follows, by using SLURM's -c(--cpus-per-task) option when submitting your job:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;LI-CODE lang="markup"&gt;#!/bin/bash
#SBATCH --job-name=test
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=32
#SBATCH --partition=general
#SBATCH --time=1:00:00
#SBATCH --account=wexler
#SBATCH --propagate=STACK
#SBATCH --cpus-per-task=1

#Set MPI environment variables
export I_MPI_FABRICS=shm:ofi
export I_MPI_DEBUG=120

mpirun -n 64 -ppn 32 IMB-MPI1 pingpong &lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thanks And Regards,&lt;/P&gt;
&lt;P&gt;Aishwarya&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 10 Aug 2023 07:02:59 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/Job-Core-Dumps-when-using-Intel-MPI-over-two-nodes/m-p/1512344#M10846</guid>
      <dc:creator>AishwaryaCV_Intel</dc:creator>
      <dc:date>2023-08-10T07:02:59Z</dc:date>
    </item>
    <item>
      <title>Re: Job Core Dumps when using Intel MPI over two nodes</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/Job-Core-Dumps-when-using-Intel-MPI-over-two-nodes/m-p/1513090#M10852</link>
      <description>&lt;P&gt;Hello&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;I do not have passwordless ssh setup. I never read that was a requirement. Please advise if it's necessary.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Interconnect - TCP/IP Networking&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Dell PowerEdge C6520&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Network&lt;/P&gt;&lt;P&gt;Product - BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet Controller&lt;/P&gt;&lt;P&gt;Driver - 4b:00.0 Ethernet controller: Broadcom Inc. and subsidiaries BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet Controller (rev 01)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Right now, the group has multiple jobs running in the queue using all resources, but I will get you the output as soon as possible.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Chris&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 10 Aug 2023 21:23:53 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/Job-Core-Dumps-when-using-Intel-MPI-over-two-nodes/m-p/1513090#M10852</guid>
      <dc:creator>chris-wustl</dc:creator>
      <dc:date>2023-08-10T21:23:53Z</dc:date>
    </item>
    <item>
      <title>Re: Job Core Dumps when using Intel MPI over two nodes</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/Job-Core-Dumps-when-using-Intel-MPI-over-two-nodes/m-p/1513859#M10857</link>
      <description>&lt;P&gt;Attached is the output requested. I am confused by results. I have all ports open for all servers see below:&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;ufw status&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Anywhere ALLOW 172.20.93.218&lt;BR /&gt;Anywhere ALLOW 172.20.93.219&lt;BR /&gt;Anywhere ALLOW 172.20.93.220&lt;BR /&gt;Anywhere ALLOW 172.20.93.221&lt;/P&gt;&lt;P&gt;Anywhere ALLOW 10.225.153.13&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;All servers are setup this way.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;What am I doing wrong?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Chris&lt;/P&gt;</description>
      <pubDate>Mon, 14 Aug 2023 13:49:32 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/Job-Core-Dumps-when-using-Intel-MPI-over-two-nodes/m-p/1513859#M10857</guid>
      <dc:creator>chris-wustl</dc:creator>
      <dc:date>2023-08-14T13:49:32Z</dc:date>
    </item>
    <item>
      <title>Re: Job Core Dumps when using Intel MPI over two nodes</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/Job-Core-Dumps-when-using-Intel-MPI-over-two-nodes/m-p/1514243#M10859</link>
      <description>&lt;P&gt;I had some test time and turn off all the firewalls on all the nodes. Still an issue. Re-ran your last test script. Attached is the output.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Chris&lt;/P&gt;</description>
      <pubDate>Tue, 15 Aug 2023 15:00:11 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/Job-Core-Dumps-when-using-Intel-MPI-over-two-nodes/m-p/1514243#M10859</guid>
      <dc:creator>chris-wustl</dc:creator>
      <dc:date>2023-08-15T15:00:11Z</dc:date>
    </item>
    <item>
      <title>Re:Job Core Dumps when using Intel MPI over two nodes</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/Job-Core-Dumps-when-using-Intel-MPI-over-two-nodes/m-p/1514924#M10868</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The crash is happening inside libucs, and I'm not sure if the calling is necessary?. Could you please let us know whether this is required for a specific feature? It's possible that the Ethernet card mandates the presence of libucs.I suggest the following steps:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;Ensure libucs is the latest version you are using.&lt;/LI&gt;&lt;LI&gt; Try running with I_MPI_TUNING_BIN="" and explicitly setting I_MPI_OFI_PROVIDER=tcp&lt;/LI&gt;&lt;LI&gt; Could you please let us know if your SLURM with other MPI implementations like OpenMPI or MPICH was able to run?&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Thanks And Regards,&lt;/P&gt;&lt;P&gt;Aishwarya&lt;/P&gt;&lt;BR /&gt;</description>
      <pubDate>Thu, 17 Aug 2023 09:04:49 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/Job-Core-Dumps-when-using-Intel-MPI-over-two-nodes/m-p/1514924#M10868</guid>
      <dc:creator>AishwaryaCV_Intel</dc:creator>
      <dc:date>2023-08-17T09:04:49Z</dc:date>
    </item>
    <item>
      <title>Re: Re:Job Core Dumps when using Intel MPI over two nodes</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/Job-Core-Dumps-when-using-Intel-MPI-over-two-nodes/m-p/1515123#M10870</link>
      <description>&lt;P&gt;Hello&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;I don't know anything about libucs or whether it is needed. I do have ProSupport from Dell which includes Ubuntu support and could pose any question that would help. Please send me what I should ask Dell. I do have the latest version.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I did re-run the script with the environment variables set in 2. That output is attached.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;No other implementations of MPI are installed. Professor would like to stick with an all Intel solution.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Chris&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 17 Aug 2023 18:16:50 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/Job-Core-Dumps-when-using-Intel-MPI-over-two-nodes/m-p/1515123#M10870</guid>
      <dc:creator>chris-wustl</dc:creator>
      <dc:date>2023-08-17T18:16:50Z</dc:date>
    </item>
    <item>
      <title>Re:Job Core Dumps when using Intel MPI over two nodes</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/Job-Core-Dumps-when-using-Intel-MPI-over-two-nodes/m-p/1516337#M10876</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;The backtrace shows that the segfault originates in libucs, which is not part of our software stack. It suggests that there is something wrong with your drivers installation. Please check the compatible version of libucs with the interface you are using.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Could you please reach out to your supplier to confirm that your installation is up-to-date? Additionally, if possible, consider testing an alternative MPI implementation(OpenMP or MPICH). This will help us triage the issue whether its from IMPI or your installation itself.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Thanks And Regards,&lt;/P&gt;&lt;P&gt;Aishwarya&lt;/P&gt;&lt;BR /&gt;</description>
      <pubDate>Tue, 22 Aug 2023 05:35:58 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/Job-Core-Dumps-when-using-Intel-MPI-over-two-nodes/m-p/1516337#M10876</guid>
      <dc:creator>AishwaryaCV_Intel</dc:creator>
      <dc:date>2023-08-22T05:35:58Z</dc:date>
    </item>
  </channel>
</rss>

