- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi ,
I compiled my code using below dependencies
intel-2018,gcc-8.2.0, fftw-3.3.8,hdf5-1.10.5,zlib-1.2.11,szlib-2.1.1
my runs also get successful few times, but now I getting below error.
cn044:UCM:e33e:aaaedcc0: 568142013 us(568142013 us!!!): DTO completion ERR: status 12, op OP_RDMA_READ, vendor_err 0x85 - 0.0.0.0
[165:cn044][../../src/mpid/ch3/channels/nemesis/netmod/dapl/dapl_poll_rc.c:1374] Intel MPI fatal error: ofa-v2-mlx5_0-1u DTO operation posted for [169:cn130] completed with error. status=0x8. cookie=0x0
[165:cn044][../../src/mpid/ch3/channels/nemesis/netmod/dapl/dapl_poll_rc.c:1374] Intel MPI fatal error: ofa-v2-mlx5_0-1u DTO operation posted for [169:cn130] completed with error. status=0x1. cookie=0x400a9
[165:cn044] unexpected DAPL connection event 0x4006 from 169
Fatal error in MPI_Waitany: Internal MPI error!, error stack:
MPI_Waitany(253).........: MPI_Waitany(count=50, req_array=0x7fffffff7180, index=0x7fffffff8398, status=0x18149e0) failed
PMPIDI_CH3I_Progress(850): fail failed
(unknown)(): Internal MPI error!
I'm really appreciate if I get any suggestions.
I'm using pbspro 2021.1.0.20210303161351
Thanks In Advance...!
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi kunfu,
Thanks for reaching out. We are checking on this and will get back to you soon.
Regards,
Athirah
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Athirah,
I'll wait for your valuable response. I just want to know possible ways to resolve this .
Thanks
kunfu
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
to investigate further, could you share:
- Your references to those steps that you did (url, etc)
- Your full steps with commands till point of error
- Your systems (OS, hardware:PC,etc)
Cordially,
Iffa
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Iffa,
1. Your references to those steps that you did (url, etc)
2. Your full steps with commands till point of error
#!/bin/bash
## JOB NAME
#PBS -N n19
#PBS -N n19
## QUEUE NAME
#PBS -q mediumq
#PBS -q mediumq
## COMPUTE RESOURCES REQUESTED FOR THE JOB
#PBS -l select=32:ncpus=32
#PBS -l select=32:ncpus=32
## SPECIFY THE EXECUTION TIME LIMIT FOR THE CODE/APPLICATION IN HRS:MINS:SECS FORMAT
#PBS -l walltime=48:00:00
#PBS -l walltime=48:00:00
## JOIN THE OUTPUT AND ERROR FILES INTO A SINGLE FILE WITH NAME <JOBNAME>.O<JOBID>
#PBS -j oe
#PBS -j oe
## EXPORT ALL ENVIRONMENT VARIABLES
#PBS -V
#PBS -V
#EMAIL IS SENT WHEN THE JOB STARTS, TERMINATES AND ABORTS
#PBS -m bea
#PBS -m bea
## SPECIFY EMAIL ADDRESS FOR NOTIFICATIONS
# WORKING DIRECTORY OF CODE/APPLICATION
cd $PBS_O_WORKDIR
mpirun -np 1024 --machinefile $PBS_NODEFILE /home/user/Programs/orb5_intel/bin/orb5 >& orb5.out
3. Your systems (OS, hardware:PC,etc)
We are running this code in high performance computing cluster
which has rhel 7.5 os installed which having W2000h-W370h F4 chassis server and infinity band switch edr 100 gb/s speed.
Thanks
kunfu
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
thank you for your patience.
This issue might relate to the MPI library or it could be something else.
We'll get back to you asap.
Cordially,
Iffa
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks Iffa ..!
I would appreciate If I get any solutions or suggestions.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Could you clarify and confirm what Intel software you are using and check if it's actually an Intel Edge Controls for Industrial, then share to us?
As mentioned previously this might relates to Intel MPI library instead of Intel Edge Controls for Industrial.
Cordially,
Iffa
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes Iffa I think you are right it might be related Intel MPI for this where will I get help.
Thanks
kunfu
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Intel will no longer monitor this thread since we have provided a solution. If you need any additional information from Intel, please submit a new question.
Cordially,
Iffa

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page