- Als neu kennzeichnen
- Lesezeichen
- Abonnieren
- Stummschalten
- RSS-Feed abonnieren
- Kennzeichnen
- Anstößigen Inhalt melden
Hi,
I recently updated my Intel compiler from Intel 14 to Intel 15 (Trail version).
I ran a cluster job on 8 nodes.
The program had an offload section to print "hi this is offload section"(The printing per node happens multiple times).
It seems like some nodes have printed the offload while others have thrown an error.
Here is the output/error I got.
offload error: cannot load library to the device 0 (error code 24)
/storage/home/aketh/cesm/cases/B_intel15/exe/cesm.exe: symbol lookup error: /storage/home/aketh/cesm/cases/B_intel15/exe/cesm.exe: undefined symbol: __offload_unregister_image
offload error: cannot load library to the device 0 (error code 24)
/storage/home/aketh/cesm/cases/B_intel15/exe/cesm.exe: symbol lookup error: /storage/home/aketh/cesm/cases/B_intel15/exe/cesm.exe: undefined symbol: __offload_unregister_image
offload error: cannot load library to the device 0 (error code 24)
/storage/home/aketh/cesm/cases/B_intel15/exe/cesm.exe: symbol lookup error: /storage/home/aketh/cesm/cases/B_intel15/exe/cesm.exe: undefined symbol: __offload_unregister_image
offload error: cannot load library to the device 0 (error code 24)
/storage/home/aketh/cesm/cases/B_intel15/exe/cesm.exe: symbol lookup error: /storage/home/aketh/cesm/cases/B_intel15/exe/cesm.exe: undefined symbol: __offload_unregister_image
offload error: cannot load library to the device 0 (error code 24)
/storage/home/aketh/cesm/cases/B_intel15/exe/cesm.exe: symbol lookup error: /storage/home/aketh/cesm/cases/B_intel15/exe/cesm.exe: undefined symbol: __offload_unregister_image
offload error: cannot load library to the device 0 (error code 24)
/storage/home/aketh/cesm/cases/B_intel15/exe/cesm.exe: symbol lookup error: /storage/home/aketh/cesm/cases/B_intel15/exe/cesm.exe: undefined symbol: __offload_unregister_image
offload error: cannot load library to the device 0 (error code 24)
/storage/home/aketh/cesm/cases/B_intel15/exe/cesm.exe: symbol lookup error: /storage/home/aketh/cesm/cases/B_intel15/exe/cesm.exe: undefined symbol: __offload_unregister_image
offload error: cannot load library to the device 0 (error code 24)
/storage/home/aketh/cesm/cases/B_intel15/exe/cesm.exe: symbol lookup error: /storage/home/aketh/cesm/cases/B_intel15/exe/cesm.exe: undefined symbol: __offload_unregister_image
hi this is the offload section
hi this is the offload section
hi this is the offload section
hi this is the offload section
hi this is the offload section
hi this is the offload section
hi this is the offload section
hi this is the offload section
hi this is the offload section
hi this is the offload section
[122:node1] unexpected disconnect completion event from [2:node8]
Assertion failed in file ../../dapl_conn_rc.c at line 1179: 0
[98:node2] unexpected disconnect completion event from [2:node8]
Assertion failed in file ../../dapl_conn_rc.c at line 1179: 0
internal ABORT - process 122
internal ABORT - process 98
[104:node2] unexpected disconnect completion event from [2:node8]
Assertion failed in file ../../dapl_conn_rc.c at line 1179: 0
internal ABORT - process 104
[120:node1] unexpected disconnect completion event from [2:node8]
Assertion failed in file ../../dapl_conn_rc.c at line 1179: 0
internal ABORT - process 120
[121:node1] unexpected disconnect completion event from [2:node8]
Assertion failed in file ../../dapl_conn_rc.c at line 1179: 0
internal ABORT - process 121
[113:node1] unexpected disconnect completion event from [2:node8]
Assertion failed in file ../../dapl_conn_rc.c at line 1179: 0
internal ABORT - process 113
[100:node2] unexpected disconnect completion event from [2:node8]
Assertion failed in file ../../dapl_conn_rc.c at line 1179: 0
internal ABORT - process 100
[108:node2] unexpected disconnect completion event from [2:node8]
Assertion failed in file ../../dapl_conn_rc.c at line 1179: 0
internal ABORT - process 108
[111:node2] unexpected disconnect completion event from [2:node8]
Assertion failed in file ../../dapl_conn_rc.c at line 1179: 0
internal ABORT - process 111
[123:node1] unexpected disconnect completion event from [2:node8]
Assertion failed in file ../../dapl_conn_rc.c at line 1179: 0
internal ABORT - process 123
hi this is the offload section
[126:node1] unexpected disconnect completion event from [2:node8]
Assertion failed in file ../../dapl_conn_rc.c at line 1179: 0
internal ABORT - process 126
hi this is the offload section
[106:node2] unexpected disconnect completion event from [2:node8]
Assertion failed in file ../../dapl_conn_rc.c at line 1179: 0
internal ABORT - process 106
hi this is the offload section
[124:node1] unexpected disconnect completion event from [2:node8]
Assertion failed in file ../../dapl_conn_rc.c at line 1179: 0
internal ABORT - process 124
[101:node2] unexpected disconnect completion event from [2:node8]
Assertion failed in file ../../dapl_conn_rc.c at line 1179: 0
internal ABORT - process 101
hi this is the offload section
Link kopiert
- Als neu kennzeichnen
- Lesezeichen
- Abonnieren
- Stummschalten
- RSS-Feed abonnieren
- Kennzeichnen
- Anstößigen Inhalt melden
Hi Aketh,
I try to figure out the issue you have, but I need more information:
- Did your application work with Intel 14 before you upgrade to Intel 15?
- Your application is a MPI program? If so what MPI version are you using?
- What MPSS are you using?
- What OS are you using?
Thanks
- Als neu kennzeichnen
- Lesezeichen
- Abonnieren
- Stummschalten
- RSS-Feed abonnieren
- Kennzeichnen
- Anstößigen Inhalt melden
OS linux
MPI 5.0
14 to 15. yes the app worked with 14 well.
MPSS Version : 3.2.1
- Als neu kennzeichnen
- Lesezeichen
- Abonnieren
- Stummschalten
- RSS-Feed abonnieren
- Kennzeichnen
- Anstößigen Inhalt melden
Hi Alketh,
I notice that the MPSS version that you use is too old, you may consider to upgrade to a recent version (e.g., MPSS 3.4). Could you be more specific on the OS Linux (i.e., RHEL xxx)? What happens when you run the utility "miccheck" from host?
- Als neu kennzeichnen
- Lesezeichen
- Abonnieren
- Stummschalten
- RSS-Feed abonnieren
- Kennzeichnen
- Anstößigen Inhalt melden
using CentOS.
Executing default tests for host
Test 0: Check number of devices the OS sees in the system ... pass
Test 1: Check mic driver is loaded ... pass
Test 2: Check number of devices driver sees in the system ... pass
Test 3: Check mpssd daemon is running ... pass
Executing default tests for device: 0
Test 4 (mic0): Check device is in online state and its postcode is FF ... pass
Test 5 (mic0): Check ras daemon is available in device ... pass
Test 6 (mic0): Check running flash version is correct ... pass
Executing default tests for device: 1
Test 7 (mic1): Check device is in online state and its postcode is FF ... pass
Test 8 (mic1): Check ras daemon is available in device ... pass
Test 9 (mic1): Check running flash version is correct ... pass
- Als neu kennzeichnen
- Lesezeichen
- Abonnieren
- Stummschalten
- RSS-Feed abonnieren
- Kennzeichnen
- Anstößigen Inhalt melden
Would you like to set the environment variable I_MPI_DEBUG
# export I_MPI_DEBUG=5
and run your program again please, this will display more debug information. Also, it is helpful to show the whole command line that executes your application.
- RSS-Feed abonnieren
- Thema als neu kennzeichnen
- Thema als gelesen kennzeichnen
- Diesen Thema für aktuellen Benutzer floaten
- Lesezeichen
- Abonnieren
- Drucker-Anzeigeseite