Difficulty creating a shared library for heterogeneous code

Brian_R_ · ‎08-12-2013

I am working with code that is intended to run on a Xeon CPU, Tesla GPU, and Phi MIC. Right now I have the first two running but getting the MIC functionality added is proving difficult. We use a shared library for easier management, usually compiled with:

g++ -shared -Wl,-soname,libhaste.so -o build/libhaste.so build/*/*.o

Note that the code I have is compiled with a Makefile so I am translating a bit from the mess of regex that is in the makefile so that it's more concise and readable.
I want to add a new function foo() that can be called and will execute entirely on the MIC, so I have created the file foo.cc containing:

void foo(){

..some code..

#pragma offload target(mic)

{

..more code..

}

..end of code..

}

I have changed from g++ to icpc which works just fine with the code I have (except the CUDA code, of course) so the file foo.cc is compiled to object code with the following (for position independent code):

icpc -fPIC -std=c++11 foo.cc -c -o foo.o

Finally, the .so file is created with the first line mentioned in this post, updated to use icpc:

icpc -shared -Wl,-soname,libhaste.so -o build/libhaste.so build/*/*.o

If I remove the MIC-targeting foo.cc file, everything works fine and icpc produces a usable .so file without complaint. However, when I add in the foo.cc/foo.o file, I get the following errors:

$ icpc -shared -Wl,-soname,libhaste.so -o build/libhaste.so build/*/*.o
ipo: warning #11010: file format not recognized for build/interpreter/fooMIC.o
ld: build/interpreter/fooMIC.o: Relocations in generic
ELF (EM: 181)
.......(more of the same)
build/interpreter/fooMIC.o: could not read symbols: File in wrong format

As far as I can tell, a new file fooMIC.o was unexpectedly created (along with a normal foo.o) which is responsible for the errors. I'm assuming the 'wrong format' errors are because it's a file meant for the MIC hardware and the linker can't handle it, but I'm at a loss as to where to go from here.

In short: how can I create a function that I intend to run exclusively on the mic that is called from the host CPU and bundle that function into a shared object along with the rest of the native CPU functions?

Loc_N_Intel · ‎08-12-2013

Hi Brian,

When you compile your offload file "foo.c", does the compiler generate the object files "foo.o" and "fooMIC.o" ?

Brian_R_ · ‎08-12-2013

Loc, yes, icpc seems to be generating both foo.o and fooMIC.o when compiled with the command in my original post. I am not really sure why it does that and if it's a result of a flag I'm passing.

TimP · ‎08-12-2013

The MIC.o files are for execution on MIC. You can't combine host and MIC .o files in a .so.

Kevin_D_Intel · ‎08-13-2013

Any possibility you could change the Makefile to replace the separate MIC compilation/linking to compilation+linkage within a single compile invocation?

In other words, instead of:

icpc -fPIC -std=c++11 foo.cc -c -o foo.o
icpc -shared -Wl,-soname,libhaste.so -o build/libhaste.so build/*/*.o

could you do something like this instead:

icpc -fPIC -std=c++11 -shared -Wl,-soname,libhaste.so -o build/libhaste.so build/*/*.cc

I can reproduce the error you experienced and only found the latter method above works (so far), but I don’t know if you can easily restructure your Makefile to emulate that. What you tried might be possible in the future if/when we produce “fat” object files and not the current separate <file>.o/<file>MIC.o pair.

I will keep poking at this to see if I am able to find another method that supports your existing Makefile structure. Sorry about the difficulties.

Brian_R_ · ‎08-13-2013

Wow, the sun never sets on Intel, thank you both for the replies I woke up to.

TimP (Intel) wrote:

The MIC.o files are for execution on MIC. You can't combine host and MIC .o files in a .so.

I suspected that it was producing object files to be used by the MIC but I wasn't sure. Thank you for the clarification. I'm sorry for my ignorance but what am I intended to do with the MIC.o file? Would I just compile my code normally, creating my .so without the MIC.o, place the MIC file on the MIC's /lib or somewhere that LD_LIBRARY_PATH points to, and when I run the executable on the host my #pragma offload target(mic) will automatically trigger the MIC which will check its .o file? All the examples I've found on Intel's support pages have great detail on native code for the Phi but not a lot for hetero models.

Kevin Davis (Intel) wrote:

Any possibility you could change the Makefile to replace the separate MIC compilation/linking to compilation+linkage within a single compile invocation?

In other words, instead of:

icpc -fPIC -std=c++11 foo.cc -c -o foo.o
icpc -shared -Wl,-soname,libhaste.so -o build/libhaste.so build/*/*.o

could you do something like this instead:

icpc -fPIC -std=c++11 -shared -Wl,-soname,libhaste.so -o build/libhaste.so build/*/*.cc

I can reproduce the error you experienced and only found the latter method above works (so far), but I don’t know if you can easily restructure your Makefile to emulate that. What you tried might be possible in the future if/when we produce “fat” object files and not the current separate <file>.o/<file>MIC.o pair.

I will keep poking at this to see if I am able to find another method that supports your existing Makefile structure. Sorry about the difficulties.

I never thought to try simplifying (arguably simpler) the makefile to do everything in one line. I never really questioned why we are producing a lot of .o files before ever making the .so. I'll fiddle with the makefile and implement the change you suggested and see how everything is handled. I'll share my findings as soon as it's taken care of.

Ravi_N_Intel · ‎08-13-2013

Any time you have offload pragmas in your code we by default generate 2 objects, host object xxxx.o and MIC object xxxxMIC.o

So in you link step you cannot use a wild card like *.o since this would also include xxxxMIC.o as an input to the host linking.

Best is to move these source files into a different directory and use only the host xxxx.o in the link phase. The compiler will use the host objects during host linking and MIC objects to create the MIC .so

By this method you can continue to use *.o for files which have no offload in them.

Brian_R_ · ‎08-13-2013

The single-line compilation+linking worked just fine. I don't exactly know why it works, but I'm happy that it does. I'm fairly sure that the MIC is now being used for offloading (I see a spike in usage on micsmc) but I can't be certain if there is code executing or if it's just querying for the MIC and then executing on the CPU. Is there any good way to tell? If I check running processes on the MIC I do see something that says 'offload' so I am assuming all is good. I'm not feeding it enough work to run at 100% yet.

I don't seem to be using the MIC.o file at all - is it intended to be unused? I thought I'd have to make a .so using the -mmic flag or something to copy onto the MIC but it seems perfectly happy to run without me doing anything but compile the code on my host and running the executable that has a #pragma offload.

Thank you again for the helpful advice.

Ravi_N_Intel · ‎08-13-2013

Set OFFLOAD_REPORT=1 (2 or 3 for more detail output). and run your program.

The report will provide you information about the run on MIC.

Kevin_D_Intel · ‎08-14-2013

The xxxxxxMIC.o files are not intended to be used/manipulated by the user. They are used “under the hood”. The single-line compilation works because our compiler driver performs as expected and deals with the host and MIC objects files accordingly. Our driver detects the #pragma offload keyword usage performs the needed host and offload compilations. Had you built your code exclusively for the Xeon Phi™ w/o using any offload keyword extensions then you would use -mmic and also build the .so in the typical fashion used on the host. Our intent is building pure native (-mmic) or offload should not differ and in the case of offload our driver (and other supportive utilities) should invisibly deal with the xxxxxxMIC.o files w/o any user intervention/knowledge with/of those.

Use the environment variable report option Ravi noted to see evidence of the offload execution.

Brian_R_ · ‎08-14-2013

Ravi, the environmental variable you provided is very handy and has made a coworker very happy with its level 3 features.

Kevin, thanks for the insight into how icpc handles the object files. I'm not well versed in compiler design and I appreciate the extra information on what goes on in the background.

Heterogeneous compilation and execution is working prefectly now. I greatly appreciate all the Intel help.

Pramod_K_ · ‎04-05-2014

Ravi Narayanaswamy (Intel) wrote:

Any time you have offload pragmas in your code we by default generate 2 objects, host object xxxx.o and MIC object xxxxMIC.o

So in you link step you cannot use a wild card like *.o since this would also include xxxxMIC.o as an input to the host linking.

Best is to move these source files into a different directory and use only the host xxxx.o in the link phase. The compiler will use the host objects during host linking and MIC objects to create the MIC .so

By this method you can continue to use *.o for files which have no offload in them.

Hi Ravi,

I have question about creating shared libraries for MIC. In large simulation code, I have annotated some functions with __declspec(target(mic)). Now makefile creates a shared library with (host) .o files. I am just wondering when MIC objects are used. You mentioned that those are used to create MIC.so shared library. But as a user, Do I need to create a shared library from those MIC objects?

In my application, I get following error when I call those functions from offload region:

offload error: cannot find address of function fun_state_skv

I am calling this function fun_state_skv using function pointer.

It will be great help if you provide some more information on this.

Kevin_D_Intel · ‎04-05-2014

The user does not manipulate the MIC objects at all. The compiler driver handles both host and MIC objects accordingly. The MIC objects are handled invisibly when creating the shared library using the -shared option.

This error you posted does not seem familiar.

What version of the compiler are yo using (icc -V)?
What version of MPSS?
Can you share details about how the libraries and application are built?

Pramod_K_ · ‎04-07-2014

Hi Kevin,

Thanks for clarification. Here are details about application:

We build shared library of compute kernels using -shared option as you mentioned above. As part of this simulation code, we have HOC interpreter which register C kernels. As part of registration process from interpreter, we create array of function pointers which are used during simulation to call these compute kernels.

I assume I get above error because function pointers for host and mic are different. In the compute kernels, I added offload section to get address on MIC and its working fine. It's difficult to explain everything but here is template code:

In shared library code, I have:

__declspec(target(mic)) extern void (*fun_ptr) (some_params);

static void register(...)
{
      .......function registration code, create array of function pointers.............

      // now to get address of function on MIC, I added below block for MIC
       #pragma offload target(mic:1)
      {
                  fun_ptr = function_name
      }
}

In the actual application:

__declspec(target(mic)) void (*fun_ptr) (some_params);

#pragma offload target(mic:1) nocopy(fun_ptr)
 {
            (*fun_ptr) (some_params)
 }

Above solution is working for me. Let me know if there is better way to achieve the same or if I am missing something obvious.