Hi ,
I compiled my code using below dependencies
intel-2018,gcc-8.2.0, fftw-3.3.8,hdf5-1.10.5,zlib-1.2.11,szlib-2.1.1
my runs also get successful few times, but now I getting below error.
cn044:UCM:e33e:aaaedcc0: 568142013 us(568142013 us!!!): DTO completion ERR: status 12, op OP_RDMA_READ, vendor_err 0x85 - 0.0.0.0
[165:cn044][../../src/mpid/ch3/channels/nemesis/netmod/dapl/dapl_poll_rc.c:1374] Intel MPI fatal error: ofa-v2-mlx5_0-1u DTO operation posted for [169:cn130] completed with error. status=0x8. cookie=0x0
[165:cn044][../../src/mpid/ch3/channels/nemesis/netmod/dapl/dapl_poll_rc.c:1374] Intel MPI fatal error: ofa-v2-mlx5_0-1u DTO operation posted for [169:cn130] completed with error. status=0x1. cookie=0x400a9
[165:cn044] unexpected DAPL connection event 0x4006 from 169
Fatal error in MPI_Waitany: Internal MPI error!, error stack:
MPI_Waitany(253).........: MPI_Waitany(count=50, req_array=0x7fffffff7180, index=0x7fffffff8398, status=0x18149e0) failed
PMPIDI_CH3I_Progress(850): fail failed
(unknown)(): Internal MPI error!
I'm really appreciate if I get any suggestions.
I'm using pbspro 2021.1.0.20210303161351
Thanks In Advance...!
链接已复制
Hi Iffa,
1. Your references to those steps that you did (url, etc)
2. Your full steps with commands till point of error
#!/bin/bash
## JOB NAME
#PBS -N n19
#PBS -N n19
## QUEUE NAME
#PBS -q mediumq
#PBS -q mediumq
## COMPUTE RESOURCES REQUESTED FOR THE JOB
#PBS -l select=32:ncpus=32
#PBS -l select=32:ncpus=32
## SPECIFY THE EXECUTION TIME LIMIT FOR THE CODE/APPLICATION IN HRS:MINS:SECS FORMAT
#PBS -l walltime=48:00:00
#PBS -l walltime=48:00:00
## JOIN THE OUTPUT AND ERROR FILES INTO A SINGLE FILE WITH NAME <JOBNAME>.O<JOBID>
#PBS -j oe
#PBS -j oe
## EXPORT ALL ENVIRONMENT VARIABLES
#PBS -V
#PBS -V
#EMAIL IS SENT WHEN THE JOB STARTS, TERMINATES AND ABORTS
#PBS -m bea
#PBS -m bea
## SPECIFY EMAIL ADDRESS FOR NOTIFICATIONS
# WORKING DIRECTORY OF CODE/APPLICATION
cd $PBS_O_WORKDIR
mpirun -np 1024 --machinefile $PBS_NODEFILE /home/user/Programs/orb5_intel/bin/orb5 >& orb5.out
3. Your systems (OS, hardware:PC,etc)
We are running this code in high performance computing cluster
which has rhel 7.5 os installed which having W2000h-W370h F4 chassis server and infinity band switch edr 100 gb/s speed.
Thanks
kunfu
Thanks Iffa ..!
I would appreciate If I get any solutions or suggestions.
Hi,
Could you clarify and confirm what Intel software you are using and check if it's actually an Intel Edge Controls for Industrial, then share to us?
As mentioned previously this might relates to Intel MPI library instead of Intel Edge Controls for Industrial.
Cordially,
Iffa
Yes Iffa I think you are right it might be related Intel MPI for this where will I get help.
Thanks
kunfu
