- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi all,
I am trying to run an application using MPI. I am testing the MPI sharing memory model. However, the application is not running in the Intel Development cloud.
My application name is: parallelSpmvAlphaBeta
The input data is: ../matrices/dc1.mm_bin ../matrices/dc1.in_bin ../matrices/dc1.out_bin
I am using the following command:
mpiexec -genv I_MPI_PIN_DOMAIN=core -genv I_MPI_FABRICS=shm:ofi -n 4 parallelSpmvAlphaBeta ../matrices/dc1.mm_bin ../matrices/dc1.in_bin ../matrices/dc1.out_bin
However, the application do not run. I have to kill it using ctrl-C:
user@idc-beta-batch-pvc-node-08:~$ mpiexec -genv I_MPI_PIN_DOMAIN=core -genv I_MPI_FABRICS=shm:ofi -n 4 parallelSpmvAlphaBeta ../matrices/dc1.mm_bin ../matrices/dc1.in_bin ../matrices/dc1.out_bin
^C[mpiexec@idc-beta-batch-pvc-node-08] Sending Ctrl-C to processes as requested
[mpiexec@idc-beta-batch-pvc-node-08] Press Ctrl-C again to force abort
^Cuser@idc-beta-batch-pvc-node-08:~$
Any idea why this is not running?
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi black__edgar,
Thank you for reaching out to us.
Could you please provide the following additional information so that we can investigate this issue further?
- Documentation or guide you referred to run the MPI application
- Instance ID:
- Reservations details
Start time:
End time:
Regards,
Hairul
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Hairul,
I am not sure where to get all the information you requested.
I am using a terminal within a JupyterLab session.
This is the MPI version I am using:
u9252df372877ec2a0108c2790881b7f@idc-beta-batch-pvc-node-01:~$ mpiexec --version
Intel(R) MPI Library for Linux* OS, Version 2021.10 Build 20230619 (id: c2e19c2f3e)
Copyright 2003-2023, Intel Corporation.
This simple command is to run my application using 4 processors inside a single node. After issuing the command, nothing happens and I have to break it using Ctrl-C twice:
$ mpiexec -np 4 -ppn 4 parallelSpmvSm ../matrices/dc1.mm_bin ../matrices/dc1.in_bin ../matrices/dc1.out_bin
^C[mpiexec@idc-beta-batch-pvc-node-01] Sending Ctrl-C to processes as requested
[mpiexec@idc-beta-batch-pvc-node-01] Press Ctrl-C again to force abort
^Cu9252df372877ec2a0108c2790881b7f@idc-beta-batch-pvc-node-01:~$
I have also tried the following without luck:
$ mpiexec -genv I_MPI_OFI_PROVIDER=tcp -genv I_MPI_PIN_DOMAIN=core -genv I_MPI_PIN_ORDER=scatter -genv I_MPI_DEBUG=4 -genv I_MPI_FABRICS=shm:ofi -np 4 -ppn 4 parallelSpmvSm ../matrices/dc1.mm_bin ../matrices/dc1.in_bin ../matrices/dc1.out_bin
^C[mpiexec@idc-beta-batch-pvc-node-01] Sending Ctrl-C to processes as requested
[mpiexec@idc-beta-batch-pvc-node-01] Press Ctrl-C again to force abort
Even the following helloWorl.c MPI program shows the same behavior.
#include <mpi.h> #include <stdio.h> int main(int argc, char** argv) { // Initialize the MPI environment MPI_Init(NULL, NULL); // Get the number of processes int world_size; MPI_Comm_size(MPI_COMM_WORLD, &world_size); // Get the rank of the process int world_rank; MPI_Comm_rank(MPI_COMM_WORLD, &world_rank); // Get the name of the processor char processor_name[MPI_MAX_PROCESSOR_NAME]; int name_len; MPI_Get_processor_name(processor_name, &name_len); // Print off a hello world message printf("Hello world from processor %s, rank %d out of %d processors\n", processor_name, world_rank, world_size); // Finalize the MPI environment. MPI_Finalize(); }
Any ideas?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi black__edgar,
Thank you for sharing the information.
We're investigating this issue and will update you on any findings as soon as possible.
On another note, were you able to successfully complete the steps in Get Started with the Intel® MPI Library for Linux* OS guide?
Regards,
Hairul
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Hairul,
Thank you very much for your help.
I had not read the guide you mentioned.
After reading it, I realized that I was missing the creation of a host file "hostfile".
I created the "hostfile" file. I am writing to it the node the jupyterlab is assigning me:
u@idc-beta-batch-pvc-node-17:~$ cat hostfile
idc-beta-batch-pvc-node-17
However, my application is not running yet.
Thanks again,
Edgar
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi black__edgar,
Thank you for the information.
For clarification purposes, were you able to run the MPI application in a non-IDC environment?
If the error persists, we would suggest for you to post your question in Intel® oneAPI HPC Toolkit since this issue is more related to the MPI Library for better support.
Regards,
Hairul
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Hairul,
Yes, my application runs in a non-IDC environment.
I will review the Intel® oneAPI HPC Toolkit forum to see if anyone has had this problem.
Thanks,
Edgar Black
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I found the problem.
I needed to add the following parameter "-host localhost,pion-ib" to the mpiexec process launcher.
my command now looks like:
mpiexec -genv I_MPI_OFI_PROVIDER=tcp -genv I_MPI_PIN_DOMAIN=core -genv I_MPI_PIN_ORDER=scatter -genv I_MPI_DEBUG=4 -genv I_MPI_FABRICS=shm:ofi -np 4 -ppn 4 -host localhost,pion-ib parallelSpmvAlphaBeta ../matrices/dc1.mm_bin ../matrices/dc1.in_bin ../matrices/dc1.out_bin
The output looks like:
[0] MPI startup(): Intel(R) MPI Library, Version 2021.10 Build 20230619 (id: c2e19c2f3e)
[0] MPI startup(): Copyright (C) 2003-2023 Intel Corporation. All rights reserved.
[0] MPI startup(): library kind: release
[0] MPI startup(): libfabric version: 1.18.0-impi
[0] MPI startup(): libfabric provider: tcp
[0] MPI startup(): File "" not found
[0] MPI startup(): Load tuning file: "/opt/intel/oneapi/mpi/2021.10.0/etc/tuning_spr_shm-ofi.dat"
[0] MPI startup(): Rank Pid Node name Pin cpu
[0] MPI startup(): 0 1358877 idc-beta-batch-pvc-node-06 {0,112}
[0] MPI startup(): 1 1358878 idc-beta-batch-pvc-node-06 {56,168}
[0] MPI startup(): 2 1358879 idc-beta-batch-pvc-node-06 {1,113}
[0] MPI startup(): 3 1358880 idc-beta-batch-pvc-node-06 {57,169}
---> Time taken by 4 processes: 0.299007 seconds, GFLOPS: 6.298500
Solution match in rank 0
Solution match in rank 2
Solution match in rank 1
Solution match in rank 3
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi black__edgar,
Glad to know that your issue has been resolved and we appreciate you sharing the solution with the community.
This thread will no longer be monitored since this issue has been resolved. If you need any additional information from Intel, please submit a new question.
Regards,
Hairul
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page