- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Guys,
I am using Xeon Phi in offload mode. Basically, I have written a code with offload pragmas (main.cpp, micSolver.cpp). First, I generate the assembly code with icc -S micSolver.cpp, and it emits two files: (1) micSolver.s, and (2) micSolverMIC.s. As expected, the micSolverMIC.s includes the code to be run on Phi.
My question is, how can I compile the assembly code further into binary code? There is no problem when using 'icc -c micSolver.s', while 'icc -c micSolverMIC.s' gives the following errors. Do you guys have any idea on how to compile the assembly code?
micSolverMIC.s: Assembler messages: micSolverMIC.s:129: Error: no such instruction: `kmov %eax,%k1' micSolverMIC.s:132: Error: no such instruction: `vpackstorelps %zmm0,(%rdx){%k1}' micSolverMIC.s:1347: Error: no such instruction: `vpxord %zmm1,%zmm1,%zmm1' micSolverMIC.s:1353: Error: no such instruction: `kmov %r13d,%k1' micSolverMIC.s:1361: Error: no such instruction: `vprefetch0 (%rdx)' micSolverMIC.s:1363: Error: no such instruction: `vprefetch0 4(%rdx)' micSolverMIC.s:1365: Error: no such instruction: `vprefetch0 (%rcx)' micSolverMIC.s:1367: Error: no such instruction: `vprefetch0 4(%rcx)' micSolverMIC.s:1369: Error: no such instruction: `vprefetch0 (%r8)' micSolverMIC.s:1371: Error: no such instruction: `vprefetch0 4(%r8)' micSolverMIC.s:1378: Error: bad register name `%zmm5' micSolverMIC.s:1380: Error: bad register name `%zmm4' micSolverMIC.s:1381: Error: bad register name `%zmm3' micSolverMIC.s:1382: Error: bad register name `%zmm2' micSolverMIC.s:1383: Error: no such instruction: `vpxord %zmm6,%zmm6,%zmm6' micSolverMIC.s:1387: Error: no such instruction: `vprefetch0 (%r9)' micSolverMIC.s:1389: Error: no such instruction: `vprefetch0 64(%r9)' micSolverMIC.s:1391: Error: no such instruction: `vprefetch0 (%r10)' micSolverMIC.s:1393: Error: no such instruction: `vprefetch0 64(%r10)' micSolverMIC.s:1395: Error: no such instruction: `vprefetch0 (%r11)' micSolverMIC.s:1397: Error: no such instruction: `vprefetch0 64(%r11)'
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You can compile the Xeon Phi™ assembly file using: icc -mmic -c micSolverMIC.s
But I'm not sure where you are headed by doing this.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Kevin Davis (Intel) wrote:
You can compile the Xeon Phi™ assembly file using: icc -mmic -c micSolverMIC.s
But I'm not sure where you are headed by doing this.
Kevin, I want to optimize the assembly code directly. For example, I plan to change the instruction order so that we can better hide the latency.
The question is, for sure, I can use icc -mmic -c micSolverMIC.s. However, the obtained micSolverMIC.o cannot be linked with the main.o, because they are in different format. For example, using icc -o a.out main.o micSolverMIC.o micSolver.o, it will report the following message. Do you know how to solve this issue?
ipo: warning #11010: *MIC* file format not recognized for /lib64/libc.so.6
ipo: warning #11010: *MIC* file format not recognized for /lib64/libc.so.6
ipo: warning #11010: file format not recognized for micSolerMIC.o
ld: micSolerMIC.o: Relocations in generic ELF (EM: 181)
ld: micSolerMIC.o: Relocations in generic ELF (EM: 181)
ld: micSolerMIC.o: Relocations in generic ELF (EM: 181)
ld: micSolerMIC.o: Relocations in generic ELF (EM: 181)
ld: micSolerMIC.o: Relocations in generic ELF (EM: 181)
ld: micSolerMIC.o: Relocations in generic ELF (EM: 181)
ld: micSolerMIC.o: Relocations in generic ELF (EM: 181)
micSolerMIC.o: could not read symbols: File in wrong format
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Jianbin F. wrote:
Kevin, I want to optimize the assembly code directly. For example, I plan to change the instruction order so that we can better hide the latency.
Don't get me wrong, but I don't think that one can write better assembly code than the compiler does. I would recommend using Intrinsics. There you can also influence the order of instructions and in addition to that the compiler is still able to optimize the code in way a normal brain could never do it.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Patrick S. wrote:
Quote:
Jianbin F. wrote:Kevin, I want to optimize the assembly code directly. For example, I plan to change the instruction order so that we can better hide the latency.
Don't get me wrong, but I don't think that one can write better assembly code than the compiler does. I would recommend using Intrinsics. There you can also influence the order of instructions and in addition to that the compiler is still able to optimize the code in way a normal brain could never do it.
Thanks for your comments. However, I indeed believe that programmers can do a better job than the compiler. For instance, programmers can put the (loop-)index-related instructions in a better place to hide the instruction latency.
For sure, I can influence the order of instructions by using intrinsics. But how can you influence the place of the index-related instructions?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If you are testing code scheduling for the Xeon Phi, I would recommend working in native mode rather than offload mode. The assembler works fine for generating mic-native binaries (as long as you use "-mmic" for each step) -- I use the same approach of modifying the compiler's assembler output for the Xeon Phi for several of my projects.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
icc -c micSolver.s generates object file for micSolverMIC.s is it is present in the same directory as micSolver.s
No need to invoke icc -c micSolver.s, which actually invokes the host assembler and hence you see unrecognized instructions.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I will have to consider further what you are interested in doing and the possible dangers and whether this is even possible given our efforts to hide the manipulation of the Xeon Phi™ specific objects/executable associated with the offload language extensions.
The offload model produces “fat” binaries (objects and executables) where the host and coprocessor object binary files are contained in a single .o file. Our compiler driver and other compiler associated tools are augmented to handle these. Producing a separate MIC.o as you described breaks the compiler driver handling and you would also be required to produce the host-side .o probably by compilation of the assembly only also otherwise a “fat” object may be produced that conflicts with your hand-made MIC.o. I have not tried this and I’m not sure this is doable using the offload model.
You may need to compile the code you want to hand tune only with -mmic and create a static or shared library containing the routine with the hand tuned assembly and call into that library from an offload pragma if you really require the offload model. (Or as John suggests, work exclusively in native mode).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
My apologies. As Ravi indicates, this is much easier than I had known/understood existed when working with assembly source files. Even when starting with the assembly file, our integration for the offload compilation is such that the user need only interact with our compiler in terms of the host-side. The compiler and other tools also invisibly handle the coprocessor side when using assembly files.
I confirmed that you can compile to assembly using -S. Hand tune the <file>MIC.s file, and then compile both the host and coprocessor .s files simply by referring to the host-side assembly file only, just as Ravi indicated.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks, Kevin. Then it will report the following message. Did I miss any library?
micSolver.cpp:(.text+0xa73): undefined reference to `__offload_target_acquire'
micSolver.cpp:(.text+0xa98): undefined reference to `__offload_offload'
micSolver.cpp:(.text+0xacf): undefined reference to `__offload_target_acquire'
micSolver.cpp:(.text+0xf7c): undefined reference to `__offload_offload'
micSolver.cpp:(.text+0xfb3): undefined reference to `__offload_target_acquire'
micSolver.cpp:(.text+0x1500): undefined reference to `__offload_offload'
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes, the offload library and likely because your main does not contain any offload language extensions so it was not included. On your link, use the icpc compiler driver and just include the option: -offload
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Do the compiler have such an option?
I am using compiler v13.1.1, but it does not have an option -offload.
Using icpc -help | grep "offload", I can get the following output:
-offload-attribute-target=<name>
file with the offload attribute target(mic)
-offload-option,<target>,<tool>,"option list"
appends additional options for offload compilations given the
-no-offload
disable any offload usage
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes but not for that old one :-) Support was added in the CXE 2013 SP1 (14.0). For the 13.x vintage, just add something in the main() like:
__attribute__((target(mic))) int foo;

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page