Software Archive
Read-only legacy content
17061 Discussions

offload error: cannot find offload entry with 16.0.1

Christof_Soeger
Beginner
675 Views

After a system and compiler upgrade my previously working program now gives the following error during an offload:

offload error: cannot find offload entry __offload_entry_offload_handler_cpp_370transfer__8c912811342804b6ec8a993e510992a1
offload error: process on the device 0 unexpectedly exited with code 1

What is really strange is that this is not the first offload. In fact multiple similar offloads from the same file in the same library worked fine and this one is then giving the error every time. I checked the compiler opt-report for offload and everything looks normal, especially the offload of that region is mentioned. I than looked at the symbols in my shared library that should contain it and it is there:

$ objdump -t lib/libnormaliz.so | grep __offload_entry_offload_handler_cpp_370
000000000253cb50 l     O .OffloadEntryTable.    0000000000000010              __offload_entry_offload_handler_cpp_370transfer__8c912811342804b6ec8a993e510992a1_$entry

I attached the log of a run with OFFLOAD_REPORT=3 494594

Our setup is: CentOS 7, MPSS 3.6.1, Intel Parallel Studio XE 2016 Cluster Edition for Linux Update 1

Does anybody have an idea what the problem could be? I can upload more information if it could be helpful.

0 Kudos
1 Solution
Rajiv_D_Intel
Employee
675 Views

The mismatch is most likely caused (indirectly) by a difference in gcc version on host and MIC.

Does the function containing the offload use parameters of types defined in gcc headers? If yes, try and change to types defined by yourself, rather than types inherited from included headers. If you can't do that, then you'll need to match up gcc versions, but try this first.

View solution in original post

0 Kudos
8 Replies
Rajiv_D_Intel
Employee
675 Views

Please try this:

Run "mic_extract <your_binary>" to extract the MIC code. it will appear on the file system as <your_binary>MIC.

Then run objdump to see the entry symbols in both binaries. See if they match.

 objdump -t <your_binary> | grep '$entry'

objdump -t <your_binary>MIC | grep '$entry'

0 Kudos
Christof_Soeger
Beginner
675 Views

I did as you said on my shared library which contains the offload segments. The entries match except the critical one:

form the lib:

000000000253cb50 l     O .OffloadEntryTable.    0000000000000010              __offload_entry_offload_handler_cpp_370transfer__8c912811342804b6ec8a993e510992a1_$entry

from the mic_extract:

000000000072cb30 l     O .OffloadEntryTable.    0000000000000010              __offload_entry_offload_handler_cpp_370transfer__6b65f83eb01cfeb45ccb4575c33520c8_$entry

0 Kudos
Rajiv_D_Intel
Employee
676 Views

The mismatch is most likely caused (indirectly) by a difference in gcc version on host and MIC.

Does the function containing the offload use parameters of types defined in gcc headers? If yes, try and change to types defined by yourself, rather than types inherited from included headers. If you can't do that, then you'll need to match up gcc versions, but try this first.

0 Kudos
Christof_Soeger
Beginner
675 Views

The function has as argument a std:list< std:vector<int> >. I moved the offload section in a separate function, now it works. Thanks for this solution!

The proper fix would still be to get matching headers. Do you have an indication what to change for that?

0 Kudos
Kevin_D_Intel
Employee
675 Views

Would it be possible for you to create a small reproducer for this situation?

Our Developers are interested in that for testing of a possible solutions.

0 Kudos
Christof_Soeger
Beginner
675 Views

Here is a minimal example of what I'm doing. The new method is my workaround and the old method is how I ran into the problem. When I run it on our server, I get:

New transfer method:
recieved 8 pyramids.
mic 0: transfered 8 pyramids.
Old transfer method:
offload error: cannot find offload entry __offload_entry_transfer_pyramids_cpp_103transfer__c4209102481176599a3116eb09cf2fbbicpc0101639380120Iu65al
offload error: process on the device 0 unexpectedly exited with code 1

494675

0 Kudos
Kevin_D_Intel
Employee
675 Views

I was advised you should be able to work around the issue by adding the option -D_GLIBCXX_USE_CXX11_ABI=0 when compiling.

Also, we should have a fix for this in our next PSXE 2016 Update 3 (16.0 compiler) release in a few months and our next major release due out later this year.

0 Kudos
Christof_Soeger
Beginner
675 Views

Thanks, that option is also working!

0 Kudos
Reply