Edge Software Catalog
Support for issues related to Edge Software Catalog
475 토론

unexpected DAPL connection event 0x4006 from 169

kunfu
초급자
2,762 조회수

Hi ,

 

I compiled my code using below dependencies 

intel-2018,gcc-8.2.0, fftw-3.3.8,hdf5-1.10.5,zlib-1.2.11,szlib-2.1.1 

my runs also get successful few times, but now I getting below error.

cn044:UCM:e33e:aaaedcc0: 568142013 us(568142013 us!!!): DTO completion ERR: status 12, op OP_RDMA_READ, vendor_err 0x85 - 0.0.0.0
[165:cn044][../../src/mpid/ch3/channels/nemesis/netmod/dapl/dapl_poll_rc.c:1374] Intel MPI fatal error: ofa-v2-mlx5_0-1u DTO operation posted for [169:cn130] completed with error. status=0x8. cookie=0x0
[165:cn044][../../src/mpid/ch3/channels/nemesis/netmod/dapl/dapl_poll_rc.c:1374] Intel MPI fatal error: ofa-v2-mlx5_0-1u DTO operation posted for [169:cn130] completed with error. status=0x1. cookie=0x400a9
[165:cn044] unexpected DAPL connection event 0x4006 from 169
Fatal error in MPI_Waitany: Internal MPI error!, error stack:
MPI_Waitany(253).........: MPI_Waitany(count=50, req_array=0x7fffffff7180, index=0x7fffffff8398, status=0x18149e0) failed
PMPIDI_CH3I_Progress(850): fail failed
(unknown)(): Internal MPI error!

I'm really appreciate if I get any suggestions.

I'm using pbspro 2021.1.0.20210303161351

 

Thanks In Advance...!

레이블 (1)
0 포인트
10 응답
Athirah_Intel
직원
2,716 조회수

Hi kunfu,


Thanks for reaching out. We are checking on this and will get back to you soon.



Regards,

Athirah


0 포인트
kunfu
초급자
2,669 조회수

Hi Athirah,

 

I'll wait for your valuable response. I just want to know possible ways to resolve this .

 

Thanks

kunfu

 

 

0 포인트
Iffa_Intel
중재자
2,663 조회수

Hi,


to investigate further, could you share:

  1. Your references to those steps that you did (url, etc)
  2. Your full steps with commands till point of error
  3. Your systems (OS, hardware:PC,etc)



Cordially,

Iffa


0 포인트
kunfu
초급자
2,656 조회수

Hi Iffa, 

1. Your references to those steps that you did (url, etc)

https://orb5.epfl.ch/

 

2. Your full steps with commands till point of error

#!/bin/bash

## JOB NAME
#PBS -N n19
#PBS -N n19
## QUEUE NAME
#PBS -q mediumq
#PBS -q mediumq
## COMPUTE RESOURCES REQUESTED FOR THE JOB
#PBS -l select=32:ncpus=32
#PBS -l select=32:ncpus=32
## SPECIFY THE EXECUTION TIME LIMIT FOR THE CODE/APPLICATION IN HRS:MINS:SECS FORMAT
#PBS -l walltime=48:00:00
#PBS -l walltime=48:00:00
## JOIN THE OUTPUT AND ERROR FILES INTO A SINGLE FILE WITH NAME <JOBNAME>.O<JOBID>
#PBS -j oe
#PBS -j oe
## EXPORT ALL ENVIRONMENT VARIABLES
#PBS -V
#PBS -V
#EMAIL IS SENT WHEN THE JOB STARTS, TERMINATES AND ABORTS
#PBS -m bea
#PBS -m bea
## SPECIFY EMAIL ADDRESS FOR NOTIFICATIONS


# WORKING DIRECTORY OF CODE/APPLICATION
cd $PBS_O_WORKDIR
mpirun -np 1024 --machinefile $PBS_NODEFILE /home/user/Programs/orb5_intel/bin/orb5 >& orb5.out

 

3. Your systems (OS, hardware:PC,etc)

We are running this code in high performance computing cluster

which has rhel 7.5 os installed which having W2000h-W370h F4 chassis server and  infinity band switch edr 100 gb/s speed. 

 

Thanks
kunfu

0 포인트
Iffa_Intel
중재자
2,622 조회수

Hi,


thank you for your patience.

This issue might relate to the MPI library or it could be something else.

We'll get back to you asap.



Cordially,

Iffa


0 포인트
kunfu
초급자
2,593 조회수

Thanks Iffa ..!

 

I would appreciate If I get any solutions or suggestions.

0 포인트
Iffa_Intel
중재자
2,571 조회수

Hi,



Could you clarify and confirm what Intel software you are using and check if it's actually an Intel Edge Controls for Industrial, then share to us?

 

As mentioned previously this might relates to Intel MPI library instead of Intel Edge Controls for Industrial.


Cordially,

Iffa


0 포인트
kunfu
초급자
2,557 조회수

 

Yes Iffa I think you are right it might be related Intel MPI for this where will I get help.

 

Thanks

kunfu

0 포인트
Iffa_Intel
중재자
2,526 조회수

 

You can contact the correct expert for MPI here: Intel® HPC Toolkit

 

 

Cordially,

Iffa

0 포인트
Iffa_Intel
중재자
2,441 조회수

Hi,


Intel will no longer monitor this thread since we have provided a solution. If you need any additional information from Intel, please submit a new question. 


Cordially,

Iffa


0 포인트
응답