Software Archive
Read-only legacy content
17061 Discussions

compile assembly code for Xeon Phi

Jianbin_F_
Beginner
1,406 Views

Hi Guys, 

I am using Xeon Phi in offload mode. Basically, I have written a code with offload pragmas (main.cpp, micSolver.cpp). First, I generate the assembly code with icc -S micSolver.cpp, and it emits two files: (1) micSolver.s, and (2) micSolverMIC.s. As expected, the micSolverMIC.s includes the code to be run on Phi. 

My question is, how can I compile the assembly code further into binary code? There is no problem when using 'icc -c micSolver.s', while 'icc -c micSolverMIC.s' gives the following errors. Do you guys have any idea on how to compile the assembly code? 

micSolverMIC.s: Assembler messages:
micSolverMIC.s:129: Error: no such instruction: `kmov %eax,%k1'
micSolverMIC.s:132: Error: no such instruction: `vpackstorelps %zmm0,(%rdx){%k1}'
micSolverMIC.s:1347: Error: no such instruction: `vpxord %zmm1,%zmm1,%zmm1'
micSolverMIC.s:1353: Error: no such instruction: `kmov %r13d,%k1'
micSolverMIC.s:1361: Error: no such instruction: `vprefetch0 (%rdx)'
micSolverMIC.s:1363: Error: no such instruction: `vprefetch0 4(%rdx)'
micSolverMIC.s:1365: Error: no such instruction: `vprefetch0 (%rcx)'
micSolverMIC.s:1367: Error: no such instruction: `vprefetch0 4(%rcx)'
micSolverMIC.s:1369: Error: no such instruction: `vprefetch0 (%r8)'
micSolverMIC.s:1371: Error: no such instruction: `vprefetch0 4(%r8)'
micSolverMIC.s:1378: Error: bad register name `%zmm5'
micSolverMIC.s:1380: Error: bad register name `%zmm4'
micSolverMIC.s:1381: Error: bad register name `%zmm3'
micSolverMIC.s:1382: Error: bad register name `%zmm2'
micSolverMIC.s:1383: Error: no such instruction: `vpxord %zmm6,%zmm6,%zmm6'
micSolverMIC.s:1387: Error: no such instruction: `vprefetch0 (%r9)'
micSolverMIC.s:1389: Error: no such instruction: `vprefetch0 64(%r9)'
micSolverMIC.s:1391: Error: no such instruction: `vprefetch0 (%r10)'
micSolverMIC.s:1393: Error: no such instruction: `vprefetch0 64(%r10)'
micSolverMIC.s:1395: Error: no such instruction: `vprefetch0 (%r11)'
micSolverMIC.s:1397: Error: no such instruction: `vprefetch0 64(%r11)'

 

0 Kudos
12 Replies
Kevin_D_Intel
Employee
1,405 Views

You can compile the Xeon Phi™ assembly file using: icc -mmic -c micSolverMIC.s

But I'm not sure where you are headed by doing this.

0 Kudos
Jianbin_F_
Beginner
1,405 Views

Kevin Davis (Intel) wrote:

You can compile the Xeon Phi™ assembly file using: icc -mmic -c micSolverMIC.s

But I'm not sure where you are headed by doing this.

Kevin, I want to optimize the assembly code directly. For example, I plan to change the instruction order so that we can better hide the latency. 

The question is, for sure, I can use icc -mmic -c micSolverMIC.s. However, the obtained micSolverMIC.o cannot be linked with the main.o, because they are in different format. For example, using icc -o a.out main.o micSolverMIC.o micSolver.o, it will report the following message. Do you know how to solve this issue? 

ipo: warning #11010: *MIC* file format not recognized for /lib64/libc.so.6
ipo: warning #11010: *MIC* file format not recognized for /lib64/libc.so.6
ipo: warning #11010: file format not recognized for micSolerMIC.o
ld: micSolerMIC.o: Relocations in generic ELF (EM: 181)
ld: micSolerMIC.o: Relocations in generic ELF (EM: 181)
ld: micSolerMIC.o: Relocations in generic ELF (EM: 181)
ld: micSolerMIC.o: Relocations in generic ELF (EM: 181)
ld: micSolerMIC.o: Relocations in generic ELF (EM: 181)
ld: micSolerMIC.o: Relocations in generic ELF (EM: 181)
ld: micSolerMIC.o: Relocations in generic ELF (EM: 181)
micSolerMIC.o: could not read symbols: File in wrong format

 

0 Kudos
Patrick_S_
New Contributor I
1,405 Views

Jianbin F. wrote:

Kevin, I want to optimize the assembly code directly. For example, I plan to change the instruction order so that we can better hide the latency. 

Don't get me wrong, but I don't think that one can write better assembly code than the compiler does. I would recommend using Intrinsics. There you can also influence the order of instructions and in addition to that the compiler is still able to optimize the code in way a normal brain could never do it.

0 Kudos
Jianbin_F_
Beginner
1,405 Views

Patrick S. wrote:

Quote:

Jianbin F. wrote:

Kevin, I want to optimize the assembly code directly. For example, I plan to change the instruction order so that we can better hide the latency. 

 

Don't get me wrong, but I don't think that one can write better assembly code than the compiler does. I would recommend using Intrinsics. There you can also influence the order of instructions and in addition to that the compiler is still able to optimize the code in way a normal brain could never do it.

Thanks for your comments. However, I indeed believe that programmers can do a better job than the compiler. For instance, programmers can put the (loop-)index-related instructions in a better place to hide the instruction latency. 

For sure, I can influence the order of instructions by using intrinsics. But how can you influence the place of the index-related instructions? 

0 Kudos
McCalpinJohn
Honored Contributor III
1,405 Views

If you are testing code scheduling for the Xeon Phi, I would recommend working in native mode rather than offload mode.  The assembler works fine for generating mic-native binaries (as long as you use "-mmic" for each step) -- I use the same approach of modifying the compiler's assembler output for the Xeon Phi for several of my projects.

0 Kudos
Ravi_N_Intel
Employee
1,405 Views

icc -c micSolver.s  generates object file for micSolverMIC.s is it is present in the same directory as micSolver.s

No need to invoke icc -c micSolver.s,   which actually invokes the host assembler and hence you see unrecognized instructions.

0 Kudos
Kevin_D_Intel
Employee
1,405 Views

I will have to consider further what you are interested in doing and the possible dangers and whether this is even possible given our efforts to hide the manipulation of the Xeon Phi™ specific objects/executable associated with the offload language extensions.

The offload model produces “fat” binaries (objects and executables) where the host and coprocessor object binary files are contained in a single .o file. Our compiler driver and other compiler associated tools are augmented to handle these. Producing a separate MIC.o as you described breaks the compiler driver handling and you would also be required to produce the host-side .o probably by compilation of the assembly only also otherwise a “fat” object may be produced that conflicts with your hand-made MIC.o. I have not tried this and I’m not sure this is doable using the offload model.

You may need to compile the code you want to hand tune only with -mmic and create a static or shared library containing the routine with the hand tuned assembly and call into that library from an offload pragma if you really require the offload model. (Or as John suggests, work exclusively in native mode).

0 Kudos
Kevin_D_Intel
Employee
1,405 Views

My apologies. As Ravi indicates, this is much easier than I had known/understood existed when working with assembly source files. Even when starting with the assembly file, our integration for the offload compilation is such that the user need only interact with our compiler in terms of the host-side. The compiler and other tools also invisibly handle the coprocessor side when using assembly files.

I confirmed that you can compile to assembly using -S. Hand tune the <file>MIC.s file, and then compile both the host and coprocessor .s files simply by referring to the host-side assembly file only, just as Ravi indicated.

0 Kudos
Jianbin_F_
Beginner
1,405 Views

Thanks, Kevin. Then it will report the following message. Did I miss any library? 

 

micSolver.cpp:(.text+0xa73): undefined reference to `__offload_target_acquire'
micSolver.cpp:(.text+0xa98): undefined reference to `__offload_offload'
micSolver.cpp:(.text+0xacf): undefined reference to `__offload_target_acquire'
micSolver.cpp:(.text+0xf7c): undefined reference to `__offload_offload'
micSolver.cpp:(.text+0xfb3): undefined reference to `__offload_target_acquire'
micSolver.cpp:(.text+0x1500): undefined reference to `__offload_offload'

0 Kudos
Kevin_D_Intel
Employee
1,405 Views

Yes, the offload library and likely because your main does not contain any offload language extensions so it was not included. On your link, use the icpc compiler driver and just include the option: -offload
 

0 Kudos
Jianbin_F_
Beginner
1,405 Views

Do the compiler have such an option? 

I am using compiler v13.1.1, but it does not have an option -offload

Using icpc -help | grep "offload", I can get the following output: 

-offload-attribute-target=<name>
          file with the offload attribute target(mic)
-offload-option,<target>,<tool>,"option list"
          appends additional options for offload compilations given the
-no-offload
          disable any offload usage

 

0 Kudos
Kevin_D_Intel
Employee
1,405 Views

Yes but not for that old one :-)   Support was added in the CXE 2013 SP1 (14.0). For the 13.x vintage, just add something in the main() like:

__attribute__((target(mic))) int foo;

 

0 Kudos
Reply