- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I recently updated my Intel compiler from Intel 14 to Intel 15 (Trail version).
I ran a cluster job on 8 nodes.
The program had an offload section to print "hi this is offload section"(The printing per node happens multiple times).
It seems like some nodes have printed the offload while others have thrown an error.
Here is the output/error I got.
offload error: cannot load library to the device 0 (error code 24)
/storage/home/aketh/cesm/cases/B_intel15/exe/cesm.exe: symbol lookup error: /storage/home/aketh/cesm/cases/B_intel15/exe/cesm.exe: undefined symbol: __offload_unregister_image
offload error: cannot load library to the device 0 (error code 24)
/storage/home/aketh/cesm/cases/B_intel15/exe/cesm.exe: symbol lookup error: /storage/home/aketh/cesm/cases/B_intel15/exe/cesm.exe: undefined symbol: __offload_unregister_image
offload error: cannot load library to the device 0 (error code 24)
/storage/home/aketh/cesm/cases/B_intel15/exe/cesm.exe: symbol lookup error: /storage/home/aketh/cesm/cases/B_intel15/exe/cesm.exe: undefined symbol: __offload_unregister_image
offload error: cannot load library to the device 0 (error code 24)
/storage/home/aketh/cesm/cases/B_intel15/exe/cesm.exe: symbol lookup error: /storage/home/aketh/cesm/cases/B_intel15/exe/cesm.exe: undefined symbol: __offload_unregister_image
offload error: cannot load library to the device 0 (error code 24)
/storage/home/aketh/cesm/cases/B_intel15/exe/cesm.exe: symbol lookup error: /storage/home/aketh/cesm/cases/B_intel15/exe/cesm.exe: undefined symbol: __offload_unregister_image
offload error: cannot load library to the device 0 (error code 24)
/storage/home/aketh/cesm/cases/B_intel15/exe/cesm.exe: symbol lookup error: /storage/home/aketh/cesm/cases/B_intel15/exe/cesm.exe: undefined symbol: __offload_unregister_image
offload error: cannot load library to the device 0 (error code 24)
/storage/home/aketh/cesm/cases/B_intel15/exe/cesm.exe: symbol lookup error: /storage/home/aketh/cesm/cases/B_intel15/exe/cesm.exe: undefined symbol: __offload_unregister_image
offload error: cannot load library to the device 0 (error code 24)
/storage/home/aketh/cesm/cases/B_intel15/exe/cesm.exe: symbol lookup error: /storage/home/aketh/cesm/cases/B_intel15/exe/cesm.exe: undefined symbol: __offload_unregister_image
hi this is the offload section
hi this is the offload section
hi this is the offload section
hi this is the offload section
hi this is the offload section
hi this is the offload section
hi this is the offload section
hi this is the offload section
hi this is the offload section
hi this is the offload section
[122:node1] unexpected disconnect completion event from [2:node8]
Assertion failed in file ../../dapl_conn_rc.c at line 1179: 0
[98:node2] unexpected disconnect completion event from [2:node8]
Assertion failed in file ../../dapl_conn_rc.c at line 1179: 0
internal ABORT - process 122
internal ABORT - process 98
[104:node2] unexpected disconnect completion event from [2:node8]
Assertion failed in file ../../dapl_conn_rc.c at line 1179: 0
internal ABORT - process 104
[120:node1] unexpected disconnect completion event from [2:node8]
Assertion failed in file ../../dapl_conn_rc.c at line 1179: 0
internal ABORT - process 120
[121:node1] unexpected disconnect completion event from [2:node8]
Assertion failed in file ../../dapl_conn_rc.c at line 1179: 0
internal ABORT - process 121
[113:node1] unexpected disconnect completion event from [2:node8]
Assertion failed in file ../../dapl_conn_rc.c at line 1179: 0
internal ABORT - process 113
[100:node2] unexpected disconnect completion event from [2:node8]
Assertion failed in file ../../dapl_conn_rc.c at line 1179: 0
internal ABORT - process 100
[108:node2] unexpected disconnect completion event from [2:node8]
Assertion failed in file ../../dapl_conn_rc.c at line 1179: 0
internal ABORT - process 108
[111:node2] unexpected disconnect completion event from [2:node8]
Assertion failed in file ../../dapl_conn_rc.c at line 1179: 0
internal ABORT - process 111
[123:node1] unexpected disconnect completion event from [2:node8]
Assertion failed in file ../../dapl_conn_rc.c at line 1179: 0
internal ABORT - process 123
hi this is the offload section
[126:node1] unexpected disconnect completion event from [2:node8]
Assertion failed in file ../../dapl_conn_rc.c at line 1179: 0
internal ABORT - process 126
hi this is the offload section
[106:node2] unexpected disconnect completion event from [2:node8]
Assertion failed in file ../../dapl_conn_rc.c at line 1179: 0
internal ABORT - process 106
hi this is the offload section
[124:node1] unexpected disconnect completion event from [2:node8]
Assertion failed in file ../../dapl_conn_rc.c at line 1179: 0
internal ABORT - process 124
[101:node2] unexpected disconnect completion event from [2:node8]
Assertion failed in file ../../dapl_conn_rc.c at line 1179: 0
internal ABORT - process 101
hi this is the offload section
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Aketh,
I try to figure out the issue you have, but I need more information:
- Did your application work with Intel 14 before you upgrade to Intel 15?
- Your application is a MPI program? If so what MPI version are you using?
- What MPSS are you using?
- What OS are you using?
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
OS linux
MPI 5.0
14 to 15. yes the app worked with 14 well.
MPSS Version : 3.2.1
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Alketh,
I notice that the MPSS version that you use is too old, you may consider to upgrade to a recent version (e.g., MPSS 3.4). Could you be more specific on the OS Linux (i.e., RHEL xxx)? What happens when you run the utility "miccheck" from host?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
using CentOS.
Executing default tests for host
Test 0: Check number of devices the OS sees in the system ... pass
Test 1: Check mic driver is loaded ... pass
Test 2: Check number of devices driver sees in the system ... pass
Test 3: Check mpssd daemon is running ... pass
Executing default tests for device: 0
Test 4 (mic0): Check device is in online state and its postcode is FF ... pass
Test 5 (mic0): Check ras daemon is available in device ... pass
Test 6 (mic0): Check running flash version is correct ... pass
Executing default tests for device: 1
Test 7 (mic1): Check device is in online state and its postcode is FF ... pass
Test 8 (mic1): Check ras daemon is available in device ... pass
Test 9 (mic1): Check running flash version is correct ... pass
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Would you like to set the environment variable I_MPI_DEBUG
# export I_MPI_DEBUG=5
and run your program again please, this will display more debug information. Also, it is helpful to show the whole command line that executes your application.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page