Edge Software Catalog
Support for issues related to Edge Software Catalog
474 讨论

unexpected DAPL connection event 0x4006 from 169

kunfu
初学者
2,746 次查看

Hi ,

 

I compiled my code using below dependencies 

intel-2018,gcc-8.2.0, fftw-3.3.8,hdf5-1.10.5,zlib-1.2.11,szlib-2.1.1 

my runs also get successful few times, but now I getting below error.

cn044:UCM:e33e:aaaedcc0: 568142013 us(568142013 us!!!): DTO completion ERR: status 12, op OP_RDMA_READ, vendor_err 0x85 - 0.0.0.0
[165:cn044][../../src/mpid/ch3/channels/nemesis/netmod/dapl/dapl_poll_rc.c:1374] Intel MPI fatal error: ofa-v2-mlx5_0-1u DTO operation posted for [169:cn130] completed with error. status=0x8. cookie=0x0
[165:cn044][../../src/mpid/ch3/channels/nemesis/netmod/dapl/dapl_poll_rc.c:1374] Intel MPI fatal error: ofa-v2-mlx5_0-1u DTO operation posted for [169:cn130] completed with error. status=0x1. cookie=0x400a9
[165:cn044] unexpected DAPL connection event 0x4006 from 169
Fatal error in MPI_Waitany: Internal MPI error!, error stack:
MPI_Waitany(253).........: MPI_Waitany(count=50, req_array=0x7fffffff7180, index=0x7fffffff8398, status=0x18149e0) failed
PMPIDI_CH3I_Progress(850): fail failed
(unknown)(): Internal MPI error!

I'm really appreciate if I get any suggestions.

I'm using pbspro 2021.1.0.20210303161351

 

Thanks In Advance...!

标签 (1)
0 项奖励
10 回复数
Athirah_Intel
员工
2,700 次查看

Hi kunfu,


Thanks for reaching out. We are checking on this and will get back to you soon.



Regards,

Athirah


0 项奖励
kunfu
初学者
2,653 次查看

Hi Athirah,

 

I'll wait for your valuable response. I just want to know possible ways to resolve this .

 

Thanks

kunfu

 

 

0 项奖励
Iffa_Intel
主持人
2,647 次查看

Hi,


to investigate further, could you share:

  1. Your references to those steps that you did (url, etc)
  2. Your full steps with commands till point of error
  3. Your systems (OS, hardware:PC,etc)



Cordially,

Iffa


0 项奖励
kunfu
初学者
2,640 次查看

Hi Iffa, 

1. Your references to those steps that you did (url, etc)

https://orb5.epfl.ch/

 

2. Your full steps with commands till point of error

#!/bin/bash

## JOB NAME
#PBS -N n19
#PBS -N n19
## QUEUE NAME
#PBS -q mediumq
#PBS -q mediumq
## COMPUTE RESOURCES REQUESTED FOR THE JOB
#PBS -l select=32:ncpus=32
#PBS -l select=32:ncpus=32
## SPECIFY THE EXECUTION TIME LIMIT FOR THE CODE/APPLICATION IN HRS:MINS:SECS FORMAT
#PBS -l walltime=48:00:00
#PBS -l walltime=48:00:00
## JOIN THE OUTPUT AND ERROR FILES INTO A SINGLE FILE WITH NAME <JOBNAME>.O<JOBID>
#PBS -j oe
#PBS -j oe
## EXPORT ALL ENVIRONMENT VARIABLES
#PBS -V
#PBS -V
#EMAIL IS SENT WHEN THE JOB STARTS, TERMINATES AND ABORTS
#PBS -m bea
#PBS -m bea
## SPECIFY EMAIL ADDRESS FOR NOTIFICATIONS


# WORKING DIRECTORY OF CODE/APPLICATION
cd $PBS_O_WORKDIR
mpirun -np 1024 --machinefile $PBS_NODEFILE /home/user/Programs/orb5_intel/bin/orb5 >& orb5.out

 

3. Your systems (OS, hardware:PC,etc)

We are running this code in high performance computing cluster

which has rhel 7.5 os installed which having W2000h-W370h F4 chassis server and  infinity band switch edr 100 gb/s speed. 

 

Thanks
kunfu

0 项奖励
Iffa_Intel
主持人
2,606 次查看

Hi,


thank you for your patience.

This issue might relate to the MPI library or it could be something else.

We'll get back to you asap.



Cordially,

Iffa


0 项奖励
kunfu
初学者
2,577 次查看

Thanks Iffa ..!

 

I would appreciate If I get any solutions or suggestions.

0 项奖励
Iffa_Intel
主持人
2,555 次查看

Hi,



Could you clarify and confirm what Intel software you are using and check if it's actually an Intel Edge Controls for Industrial, then share to us?

 

As mentioned previously this might relates to Intel MPI library instead of Intel Edge Controls for Industrial.


Cordially,

Iffa


0 项奖励
kunfu
初学者
2,541 次查看

 

Yes Iffa I think you are right it might be related Intel MPI for this where will I get help.

 

Thanks

kunfu

0 项奖励
Iffa_Intel
主持人
2,510 次查看

 

You can contact the correct expert for MPI here: Intel® HPC Toolkit

 

 

Cordially,

Iffa

0 项奖励
Iffa_Intel
主持人
2,425 次查看

Hi,


Intel will no longer monitor this thread since we have provided a solution. If you need any additional information from Intel, please submit a new question. 


Cordially,

Iffa


0 项奖励
回复