Software Archive
Read-only legacy content
17061 Discussions

offload debugging in eclipse

Christof_Soeger
Beginner
1,862 Views
I'm trying to debug offloaded code and followed https://software.intel.com/en-us/articles/debugging-intel-xeon-phi-applications-on-linux-host#Debugging%20with%20Eclipse*%20IDE The general setup seems to work. But there are some error messages and when the program runs into a segfault (on one of the mics) I do not get a useful backtrace. The output of the debugger gives some hints. 1) at the very beginning there is a warning: Architecture rejected target-supplied description 2) At the begin of the first offload I get for every mic such a message: No symbol table is loaded. Use the warning: Could not load shared library symbols for /tmp/coi_procs/1/129002/libnormaliz.so.2.12MIC.1. Do you need "set solib-search-path" or "set sysroot"? warning: File "/opt/mpss/3.4.1/sysroots/k1om-mpss-linux/usr/lib64/libstdc++.so.6.0.16-gdb.py" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load". This seems to be the problem. In the link above I also found "If libraries have different paths on host & target, help the debugger to find them: (gdb) set solib-search-path is a colon separated list of paths to look for libraries on the host" But it is for direct gdb use. I do not know how to do it in eclipse. I thought it will all be handled by the plugins you provide. I'm using mpss 3.4.1, icc 15.0.0 20140723, and eclipse Luna 4.4.1. The full debugger output is below. [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". [New Thread 0x7ffff4b60700 (LWP 15388)] [Thread 0x7ffff4b60700 (LWP 15388) exited] [New Thread 0x7ffff4b60700 (LWP 15390)] [Thread 0x7ffff4b60700 (LWP 15390) exited] [New Thread 0x7ffff4b60700 (LWP 15392)] [Thread 0x7ffff4b60700 (LWP 15392) exited] [New Thread 0x7ffff4b60700 (LWP 15394)] [Thread 0x7ffff4b60700 (LWP 15394) exited] [New Thread 0x7ffff3b25700 (LWP 16463)] [New Thread 0x7ffff4b60700 (LWP 16464)] [New Thread 0x7ffff3ac1700 (LWP 16465)] [New Thread 0x7ffff36c0700 (LWP 16466)] [New Thread 0x7ffff32bf700 (LWP 16467)] [New Thread 0x7ffff2ebe700 (LWP 16468)] [New Thread 0x7ffff2abd700 (LWP 16469)] [New Thread 0x7ffff26bc700 (LWP 16470)] [New Thread 0x7ffff22bb700 (LWP 16471)] [New Thread 0x7ffff1eba700 (LWP 16472)] [New Thread 0x7ffff1ab9700 (LWP 16473)] [New Thread 0x7ffff16b8700 (LWP 16474)] [New Thread 0x7ffff12b7700 (LWP 16475)] [New Thread 0x7ffff0eb6700 (LWP 16476)] [New Thread 0x7ffff0ab5700 (LWP 16477)] [New Thread 0x7ffff06b4700 (LWP 16478)] [New Thread 0x7fffb3fff700 (LWP 16479)] [New Thread 0x7fffb3bfe700 (LWP 16480)] [New Thread 0x7fffb37fd700 (LWP 16481)] [New Thread 0x7fffb31fc700 (LWP 16482)] [New Thread 0x7fffb2dfb700 (LWP 16483)] [New Thread 0x7fffb23fa700 (LWP 16484)] [Thread 0x7fffb23fa700 (LWP 16484) exited] [New Thread 0x7fffb23fa700 (LWP 16485)] [Thread 0x7fffb23fa700 (LWP 16485) exited] [New Thread 0x7fffb23fa700 (LWP 16486)] [New Thread 0x7fffb19f9700 (LWP 16487)] [Thread 0x7fffb19f9700 (LWP 16487) exited] [New Thread 0x7fffb19f9700 (LWP 16488)] Attaching MIC process... MIC card number is 0. MIC Pid is 129002. MIC local file is "/tmp/coi_procs/1/129002/normaliz?icpcoutqUPyvt". MIC card address is "131.173.40.12".
0 Kudos
1 Solution
Georg_Z_Intel
Employee
1,862 Views

Hello,

I suspect you mixed integration and debugger (start_mpm.sh) from MPSS and Composer Edition package. Hence you see "warning: Architecture rejected target-supplied description".

Could you please do the following:

  1. Uninstall the existing debugger integration for Intel(R) MIC.
  2. Get the latest Intel(R) Parallel Studio XE 2015 Composer Edition Update 2 package and install it.
  3. Install the debugger extension from there (<composer_xe_root>/debugger/cdt/)
  4. When building an application make sure to use the latest compiler 15.0.2 (always use -V to print the version just to be sure); also source the compilervars.sh before(!) starting Eclipse.
  5. When creating the debug configuration, use the start_mpm.sh from the same Composer Edition: <composer_xe_root>/deubgger/mpm/bin/start_mpm.sh

After that, you should not see "warning: Architecture rejected target-supplied description" anymore.

There will still be the warnings about "auto-load safe-path". This is because of a recent change in MPSS. The debugger does not load the pretty printing files. That's not critical and can be ignored. However, you won't be able to evaluate C++ STL containers then. Our engineers have been informed and fix this for a future update.

When you debug your application and it stops with a SEGV, there are two possible scenarios:

  • SEGV is caused due to offloading itself. If you do not use the offload extensions correctly you'll easily fail like that (e.g. offload data that is not allocated on the host). There is no way to debug this with the debugger, because that SEGV happens exactly between offloading and initiation of the coprocessor debug session (gray area). The only way to debug this is to use OFFLOAD_REPORT (https://software.intel.com/en-us/node/522521#22ADAA50-EC03-4E55-8EAB-0EDA10003ADB).
     
  • SEGV was caused in your own application (or SO). In that case, the stacktrace will be empty if debug information is missing. In the "Debug Configurations" dialog under "Debugger" tab, sub-tab "Shared Libraries" you can set the SO search path, adding paths containing the SOs with debug information.

To double-check that your offload debug integration is working, you might create a simple test that offloads a (parallel) loop. You can set a BP directly in the loop and the debugger should stop there. Start the debugging session and resume with <F8>. Note that by default Eclipse will stop at "main()". Just continue with <F8>.

Best regards,

Georg Zitzlsberger

View solution in original post

0 Kudos
13 Replies
Kevin_D_Intel
Employee
1,862 Views

Sorry to hear about the trouble debugging. I contacted someone with more knowledge about the debugger and integrations to see if they can offer some help. Stay tuned...

0 Kudos
Georg_Z_Intel
Employee
1,862 Views

Hello,

I think you have a proper setup because with the latest versions I see the same warnings/errors. You don't need to set sysroot or solib search path for system libraries. All is done by the start_mpm.sh script automatically.

I'm currently discussing with our engineers what is wrong and come back to you once I have an answer.

Btw.: Which integration have you used? The one from MPSS or from the Composer Edition? Which compiler version are you using? I'd also need the path to start_mpm.sh (just to be sure).

Best regards,

Georg Zitzlsberger

0 Kudos
Christof_Soeger
Beginner
1,862 Views
Thanks for your replies. Here a more precise description of my problem. When my program which uses offloads runs into a problem on the host, I get a good backtrace with the debugger. But when it happens on the mic, I get a backtrace like that: Thread [62] 67362 [core: mic1-1] (Suspended : Signal : SIGABRT:Aborted) raise() at pt-raise.c:42 0x7f701fe5538b 0x7f702161083d 0x7f7021db0850 0x4 0x17 0x7f7017ffe230 0x7f7018001770 do_lookup_x() at dl-lookup.c:272 0x7f7021b9d372 _dl_lookup_symbol_x() at dl-lookup.c:739 0x7f7021b9d5f5 _dl_fixup() at dl-runtime.c:119 0x7f7021ba0e36 _dl_runtime_resolve() at 0x7f7021ba6df5 0x7f702162c08d 0x0 Which tells me nothing and looks like there is debugging information missing. (I simulated a problem with a raise(SIGABRT) for test purposes.)
0 Kudos
Christof_Soeger
Beginner
1,862 Views

Oh and I wanted to say that all that happens in code that is in a shared library.

And one more question: Is it possible to set breakpoints in offloaded areas? It does not work for me when I simply define them in the eclipse environment.

0 Kudos
Georg_Z_Intel
Employee
1,863 Views

Hello,

I suspect you mixed integration and debugger (start_mpm.sh) from MPSS and Composer Edition package. Hence you see "warning: Architecture rejected target-supplied description".

Could you please do the following:

  1. Uninstall the existing debugger integration for Intel(R) MIC.
  2. Get the latest Intel(R) Parallel Studio XE 2015 Composer Edition Update 2 package and install it.
  3. Install the debugger extension from there (<composer_xe_root>/debugger/cdt/)
  4. When building an application make sure to use the latest compiler 15.0.2 (always use -V to print the version just to be sure); also source the compilervars.sh before(!) starting Eclipse.
  5. When creating the debug configuration, use the start_mpm.sh from the same Composer Edition: <composer_xe_root>/deubgger/mpm/bin/start_mpm.sh

After that, you should not see "warning: Architecture rejected target-supplied description" anymore.

There will still be the warnings about "auto-load safe-path". This is because of a recent change in MPSS. The debugger does not load the pretty printing files. That's not critical and can be ignored. However, you won't be able to evaluate C++ STL containers then. Our engineers have been informed and fix this for a future update.

When you debug your application and it stops with a SEGV, there are two possible scenarios:

  • SEGV is caused due to offloading itself. If you do not use the offload extensions correctly you'll easily fail like that (e.g. offload data that is not allocated on the host). There is no way to debug this with the debugger, because that SEGV happens exactly between offloading and initiation of the coprocessor debug session (gray area). The only way to debug this is to use OFFLOAD_REPORT (https://software.intel.com/en-us/node/522521#22ADAA50-EC03-4E55-8EAB-0EDA10003ADB).
     
  • SEGV was caused in your own application (or SO). In that case, the stacktrace will be empty if debug information is missing. In the "Debug Configurations" dialog under "Debugger" tab, sub-tab "Shared Libraries" you can set the SO search path, adding paths containing the SOs with debug information.

To double-check that your offload debug integration is working, you might create a simple test that offloads a (parallel) loop. You can set a BP directly in the loop and the debugger should stop there. Start the debugging session and resume with <F8>. Note that by default Eclipse will stop at "main()". Just continue with <F8>.

Best regards,

Georg Zitzlsberger

0 Kudos
Christof_Soeger
Beginner
1,862 Views

Hello Georg,

thanks for your helpful reply. I now found out that I was indeed using the plugin from the composer, but the system default version of start_mpm.sh which was from MPSS. I got the new composer version installed and reinstalled all the eclipse plugins from the new composer version. The "warning: Architecture rejected target-supplied description" in now gone, but the problem remains.

When I set a breakpoint or the program cancels for some reason while on the host the debugging works fine and I get the correct backtrace. But breakpoints that should be encountered in offloaded code do not break the execution and errors (triggered e.g. by raise(SIGABRT)) in offloaded areas give a useless backtrace, see below.

I now get at the beginning: "No symbol table is loaded.  Use the No source file named /home/math/csoeger/phi_normaliz/source/libnormaliz/full_cone.cpp."  It looks like "No symbol table is loaded.  Use the" is the start of a message and afterwards there comes another message. That is cannot find the source file is strange. It is the correct file and exists at the reported location.

And at the first offload the probably more important problem (for each mic)

Attaching MIC process...
MIC card number is 0.
MIC Pid is 155768.
MIC local file is "/tmp/coi_procs/1/155768/normaliz?icpcoutBobIAV".
MIC card address is "131.173.40.12".
No symbol table is loaded.  Use the warning: Could not load shared library symbols for /tmp/coi_procs/1/155768/libnormaliz.so.2.12MIC.2.
Do you need "set solib-search-path" or "set sysroot"?
warning: File "/opt/mpss/3.4.1/sysroots/k1om-mpss-linux/usr/lib64/libstdc++.so.6.0.16-gdb.py" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load".
MIC process is attached.

I looked into  /tmp/coi_procs/1/155768/ and there is my main program normaliz?icpcoutBobIAV and all kind off libs I use, e.g. libgmp.so.10, liboffload.so.5... But not libnormaliz.so.2.12MIC.2. So it is not surprising it could not symbol tables. There is /tmp/coi_procs/1/155768/load_lib/libnormaliz.so.2.12.2?icpcoutYHdwfa

Maybe I should say that I also set

   export     LD_LIBRARY_PATH=$HOME/local/lib/:$LD_LIBRARY_PATH
   export MIC_LD_LIBRARY_PATH=$HOME/mic_local/lib/:$MIC_LD_LIBRARY_PATH


and all shared librarys can be found in these paths, libgmp.so as well as libnormaliz.so.

 

EDIT: The problem that I want to debug happens inside an offloaded region, not during the offloading process itself.

And with a simple test example (just one file, no library) everything works fine.

0 Kudos
Georg_Z_Intel
Employee
1,862 Views

Hello,

could you please use "mic_extract" to extract the K1OM object code from the fat shared library/libraries? See:
https://software.intel.com/en-us/node/524818

E.g. for a libx.so that was compiled with __declspec(target(mic)) in it:

$ mic_extract libx.so

You will receive a libxMIC.so. That one will be then used by the debugger. In doubt, extract the K1OM object code from all.

I'll talk with our debugger engineers why this cannot be done automatically by GDB.

Please let me know whether this works for you.

Best regards,

Georg Zitzlsberger

0 Kudos
Christof_Soeger
Beginner
1,862 Views

I can extract the mic code to recive the missing file libnormaliz.so.2.12MIC.2. But I don't know how I can tell the debugger to use it. I tried to use "set solib-search-path" before the offload, but that does not make a difference. It still gives

No symbol table is loaded.  Use the warning: Could not load shared library symbols for /tmp/coi_procs/1/155891/libnormaliz.so.2.12MIC.2.
Do you need "set solib-search-path" or "set sysroot"?
warning: File "/opt/mpss/3.4.1/sysroots/k1om-mpss-linux/usr/lib64/libstdc++.so.6.0.16-gdb.py" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load".

I tried to put it there after the crash, but then it does not seems to be recognized, even when I put it there after the offload started and before the crash. And I cannot put it there before the offload, because I don't know which directory it will create. It depends on the process ID on the mic? Ok, now I guessed what the PID should be and created the directory beforehand and put the file there, still the same error, even if the file is there.

0 Kudos
Georg_Z_Intel
Employee
1,862 Views

Hello,

for MPSS 3.x the following environment variables need to be set before starting Eclipse IDE:

AMPLXE_COI_DEBUG_SUPPORT=TRUE
MYO_WATCHDOG_MONITOR=-1

See https://software.intel.com/en-us/articles/debugging-intel-xeon-phi-applications-on-linux-host#Additional%20Requirements%20for%20Offload%20Debugging.

If the SO libnormaliz.so.2.12MIC.2 is missing on the target, then those variables are not set. In addition, you still need to extract the libnormaliz.so.2.12MIC.2 via "mic_extract" on the host and place it at the same location of the host's libnormaliz.so (well, basically the latter is a fat SO containing both host & K1OM object code). If you use "mic_extract" it will do that already. No additional setting of "set solib-search-path" or "set sysroot" should be required.

Please let me know if this worked.

Thank you &  best regards,

Georg Zitzlsberger

Edit: In case you wonder, I've extended the article to cover the SO problem as well. I'm still not in favor of the separate "mic_extract" step and am discussing it with engineering.

 

0 Kudos
Christof_Soeger
Beginner
1,862 Views

I do set these values. To start eclipse I run a script with this content:

source /opt/intel/composer_xe_2015/bin/compilervars.sh intel64
export AMPLXE_COI_DEBUG_SUPPORT=TRUE
export MYO_WATCHDOG_MONITOR=-1
/usr/local/bin/eclipse-4.4

I also used mic_extract now. But still the same problem. The file which the debugger is looking for is not there (/tmp/coi_procs/1/156245/libnormaliz.so.2.12MIC.2), but there is /tmp/coi_procs/1/156245/load_lib/libnormaliz.so.2.12MIC.2?icpcoutIfGu56 which is the extracted SO for the mic. So is the problem only this path difference? Or should there be a copy?

 

There is also another slightly strange thing with the shared lib. I don't know if it is related. My program and lib use a 3rd party lib (gmp). I compiled a version of libgmp for the host and for the mic, installed them to $HOME/local/lib and $HOME/mic_local/lib respectively, and set

   export     LD_LIBRARY_PATH=$HOME/local/lib/:$LD_LIBRARY_PATH
   export MIC_LD_LIBRARY_PATH=$HOME/mic_local/lib/:$MIC_LD_LIBRARY_PATH


that works. Now my own program including libnormaliz is build in the eclipse project build directory and also started from there. When I run my program I get the error

offload error: unexpected embedded target binary type, expected either an executable or shared library

but when I install my lib also to $HOME/local/lib/ it works. Now the strange part. If I change my lib, recompile it, but NOT install it to $HOME/local/lib/ it still works and uses the NEW version. So I'm confused which version is actually used. I always take care that I install after recompilation to avoid confusions. So this shouldn't be the problem.

0 Kudos
Georg_Z_Intel
Employee
1,862 Views

Hello,

this is strange. Whatever I do, I cannot reproduce your scenario.

So, your application (noramliz) is using a shared library (libnormaliz) which contains some offload pragmas (K1OM object code). That library also makes use of libgmp within the offload sections and hence two versions are required for host & coprocessor. I hope I got that right so far.

Can we exclude that you're loading any of the libraries of your application dynamically during runtime (i.e. dlopen(...))? That would make a difference since dependencies are not handled transparently with this.

I was only able to see a missing "libnormalizMIC" under /tmp/coi_procs/<card #>/<pid> (not the load_lib subdirectory but the root directory) if the library is dynamically loaded via "dlopen(...)". Using shared libraries w/o dynamically loading them, they will all be loaded at runtime before the first offload is done. Hence they all end up at the /tmp/coi_procs/<card #>/<pid>/ root directory. I'm surprised that "libnormalizMIC" is missing in your case and would like to understand that first. Do you have an offload section within libnormaliz or is the call to a function of libnormaliz offloaded already?

I guess the library loading sequence of your project is important here. Would it be possible for you to create a dummy project that simulates the load and use order of the libraries?

Best regards,

Georg Zitzlsberger

0 Kudos
Christof_Soeger
Beginner
1,862 Views

First of all: Thanks for all your quick and helpful support! Since the problem is connected to the loading of my shared library, I now switched to compile and link libnormaliz statically. And in this configuration debugging works. So for me it is not so important to figure out the cause of the problem.

But of course I can investigate it further to find the problem. You got everything right. The offload sections are inside libnormaliz. The only lib that I included additionally is libgmp. Otherwise there are only system libraries like libiomp, ... If you are interested in it, I can try to set up a dummy example.

Thanks again,

Christof

0 Kudos
Georg_Z_Intel
Employee
1,862 Views

Hello Christof,

great to hear that you finally got a working environment!

I'd still be interested in a reproducer if you can find the time to provide one to me. It also might help others having similar problems while stumbling over this thread...

Thank you & best regards,

Georg Zitzlsberger

 

0 Kudos
Reply