- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm trying to run a simple Hello World MPI example inside the Docker container image and getting a SEGFAULT.
I'm using Docker container image:
intel/oneapi-hpckit:2022.1.1-devel-ubuntu18.04
https://hub.docker.com/r/intel/oneapi-hpckit
$> docker run -it --rm intel/oneapi-hpckit:2022.1.1-devel-ubuntu18.04
root@c1ea2a0c8961:/# cat <<EOF >hello.c
#include <mpi.h>
#include <stdio.h>
int main(int argc, char** argv) {
// Initialize the MPI environment
MPI_Init(NULL, NULL);
// Get the number of processes
int world_size;
MPI_Comm_size(MPI_COMM_WORLD, &world_size);
// Get the rank of the process
int world_rank;
MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
// Get the name of the processor
char processor_name[MPI_MAX_PROCESSOR_NAME];
int name_len;
MPI_Get_processor_name(processor_name, &name_len);
// Print off a hello world message
printf("Hello world from processor %s, rank %d out of %d processors\n",
processor_name, world_rank, world_size);
// Finalize the MPI environment.
MPI_Finalize();
}
EOF
root@c1ea2a0c8961:/# which mpicc
/opt/intel/oneapi/mpi/2021.5.0//bin/mpicc
root@c1ea2a0c8961:/# mpicc hello.c -o hello
root@c1ea2a0c8961:/# export I_MPI_DEBUG=5
root@c1ea2a0c8961:/# mpirun -n 1 ./hello
[0] MPI startup(): Intel(R) MPI Library, Version 2021.5 Build 20211102 (id: 9279b7d62)
[0] MPI startup(): Copyright (C) 2003-2021 Intel Corporation. All rights reserved.
[0] MPI startup(): library kind: release
[0] MPI startup(): libfabric version: 1.13.2rc1-impi
libfabric:259:core:core:ofi_hmem_init():209<info> Hmem iface FI_HMEM_CUDA not supported
libfabric:259:core:core:ofi_hmem_init():209<info> Hmem iface FI_HMEM_ROCR not supported
libfabric:259:core:core:ofi_hmem_init():214<warn> Failed to initialize hmem iface FI_HMEM_ZE: Input/output error
libfabric:259:core:mr:ofi_default_cache_size():78<info> default cache size=2815754389
libfabric:259:core:core:ofi_register_provider():474<info> registering provider: tcp (113.20)
libfabric:259:core:core:ofi_register_provider():474<info> registering provider: sockets (113.20)
libfabric:259:core:core:ofi_hmem_init():209<info> Hmem iface FI_HMEM_CUDA not supported
libfabric:259:core:core:ofi_hmem_init():209<info> Hmem iface FI_HMEM_ROCR not supported
libfabric:259:core:core:ofi_hmem_init():209<info> Hmem iface FI_HMEM_ZE not supported
libfabric:259:core:core:ofi_register_provider():474<info> registering provider: shm (113.20)
libfabric:259:core:core:ofi_hmem_init():209<info> Hmem iface FI_HMEM_CUDA not supported
libfabric:259:core:core:ofi_hmem_init():209<info> Hmem iface FI_HMEM_ROCR not supported
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 0 PID 259 RUNNING AT c1ea2a0c8961
= KILLED BY SIGNAL: 11 (Segmentation fault)
===================================================================================
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I was able to resolve this by setting I_MPI_FABRICS=shm and launching the container with increased shared memory:
docker run --shm-size=512m ...
Thank you!
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page