Defining "Disassembly of section .plt" in C/C++ code

srimks · ‎05-13-2009

Hello,

Need help from assembler programmers for Inline SSE assembly programming as my understanding is very minimal.

Normally, during objdump of an executable, we have "Disassembly of section .plt" created by the loader for various operations like sin, cos, float, etc as sin@plt, cos@plt, fmod@plt respectively.

E.g:
-----
0000000000401a50 :
401a50: ff 25 d2 68 18 00 jmpq *1599698(%rip) # 588328 <_GLOBAL_OFFSET_TABLE_+0xf8>
401a56: 68 1c 00 00 00 pushq $0x1c
401a5b: e9 20 fe ff ff jmpq 401880 <_init+0x18>

0000000000401a60 :
401a60: ff 25 ca 68 18 00 jmpq *1599690(%rip) # 588330 <_GLOBAL_OFFSET_TABLE_+0x100>
401a66: 68 1d 00 00 00 pushq $0x1d
401a6b: e9 10 fe ff ff jmpq 401880 <_init+0x18>

0000000000401a90 :
401a90: ff 25 b2 68 18 00 jmpq *1599666(%rip) # 588348 <_GLOBAL_OFFSET_TABLE_+0x118>
401a96: 68 20 00 00 00 pushq $0x20
401a9b: e9 e0 fd ff ff jmpq 401880 <_init+0x18>
----

The disassembly of normal C/C++ section of code has reference to above .PLT (Procedure Linkage Table) calls as below -
----
s = sin(this_tor = ModRad(now.tor));
44d9b3: f2 42 0f 10 84 e4 c0 movsd 0xc0(%rsp,%r12,8),%xmm0
44d9ba: 00 00 00
44d9bd: f2 0f 10 0d 0b 5a 02 movsd 154123(%rip),%xmm1 # 4733d0 <_2il0floatpacket.1>
44d9c4: 00
44d9c5: e8 86 40 fb ff callq 401a50 <fmod@plt>
44d9ca: f2 0f 11 44 24 40 movsd %xmm0,0x40(%rsp)
44d9d0: f2 0f 10 44 24 40 movsd 0x40(%rsp),%xmm0
44d9d6: e8 b5 40 fb ff callq 401a90 <sin@plt>
44d9db: 0f 28 c8 movaps %xmm0,%xmm1

o = 1. - (c = cos(this_tor));
44d9de: f2 0f 10 44 24 40 movsd 0x40(%rsp),%xmm0
44d9e4: f2 0f 11 4c 24 48 movsd %xmm1,0x48(%rsp)
44d9ea: e8 71 40 fb ff callq 401a60 <cos@plt>
44d9ef: f2 0f 10 4c 24 48 movsd 0x48(%rsp),%xmm1
44d9f5: f2 44 0f 10 3d da 59 movsd 154074(%rip),%xmm15 # 4733d8 <_2il0floatpacket.2>
44d9fc: 02 00
----

If I need to replace the section of code written as C/C++ whose objdump is as mentioned above with suitable "Inline asm SSE instructions", how should I replace or call above .PLT through Inline SSE asm instruction programming.

The objective is simply to represent .PLT calls as generated through objdump with Inline SSE asm programming or how to call the refernce of .PLT through Inline SSE asm instruction.

The code which I am redesigning with Inline SSE asm is a part of multi C/C++ package executed on Linux x86_64 where above .PLT address locations remains same even I modify the code, but probably if I port this modified Inline SSE asm code after addressing above issues, the address will not remain constant. So, the Inline SSE asm has to be written in such a way such that address for relocating and mapping with other .PLT shouldn't be an issues.

Can I have some solution to address all this when writing Inline SSE asm code? I think I have addressed it clearly.

~BR

TimP · ‎05-13-2009

No, I don't think you have stated your intention clearly.
It might be worth the effort to study the motivation of the gcc math builtins, to get some background. I don't understand it myself; it does present some problems in compatibility between icpc and the latest g++.
http://gcc.gnu.org/onlinedocs/gcc-4.4.0/gcc/Other-Builtins.html#Other-Builtins
Presumably, your compiler already has options to generate inline code for the math functions where it's suitable, such as sqrt() or fabs().
If you were writing x87 code, you might consider adapting the mathinline.h scheme of 32-bit glibc. I can't think of a reason for this, unless you wanted to make self-contained object code not requiring math library functions, trading a loss of performance for small total code size. For sqrt(), you could use this as a way to drop the errno.h and eh, supplying SSE code, in case you don't want to deal with other mechanisms which compilers such as gcc have for that purpose.
You could get a C source code of the math function and force inline it, to see if you could get a slight performance improvement at the cost of increased code size. This might have a better chance at engaging optimization than would a scheme with asm macros.
I've heard people are making an effort toward providing a somewhat workable open source SSE math library. The amazingly long time this has taken testifies to the obstacles.

srimks · ‎05-13-2009

Quoting - tim18

No, I don't think you have stated your intention clearly.
It might be worth the effort to study the motivation of the gcc math builtins, to get some background. I don't understand it myself; it does present some problems in compatibility between icpc and the latest g++.
http://gcc.gnu.org/onlinedocs/gcc-4.4.0/gcc/Other-Builtins.html#Other-Builtins
Presumably, your compiler already has options to generate inline code for the math functions where it's suitable, such as sqrt() or fabs().
If you were writing x87 code, you might consider adapting the mathinline.h scheme of 32-bit glibc. I can't think of a reason for this, unless you wanted to make self-contained object code not requiring math library functions, trading a loss of performance for small total code size. For sqrt(), you could use this as a way to drop the errno.h and eh, supplying SSE code, in case you don't want to deal with other mechanisms which compilers such as gcc have for that purpose.
You could get a C source code of the math function and force inline it, to see if you could get a slight performance improvement at the cost of increased code size. This might have a better chance at engaging optimization than would a scheme with asm macros.
I've heard people are making an effort toward providing a somewhat workable open source SSE math library. The amazingly long time this has taken testifies to the obstacles.

I think the need is to understand PLT within GOT and how to call GOT address using Inline SSE instructions.

I stated "If I need to replace the section of code written as C/C++ whose objdump is as mentioned above with suitable "Inline asm SSE instructions", how should I replace or call above .PLT through Inline SSE asm instruction programming.

The code which I am thinking to redesign with Inline SSE asm is a part of multi C/C++ package executed on Linux x86_64 where aboveGOT address locations remains same even I modify the code, but probably if I port this modified Inline SSE asm code after addressing above issues to different series of Intel Xeon 5XXX processors, the address will not remain constant. So, the Inline SSE asm has to be written in such a way such that address for relocating and mapping with .PLT shouldn't be an issues."

Probably, people who write assembly can address such question or discuss on it for its relevance.

~BR

Melanie_B_Intel · ‎05-14-2009

Hi srimks,
To answer your question directly, I would suggest that you start by reading the assembly code which is created by the compiler (vs. the disassembly of an object file.) e.g. "icc -S foo.c" means "put the assembly code in the file named foo.s". The same switch works for gcc. When you look at the assembly code you'll see the assembler syntax for referring to GOT and PLT. GOT and PLT are used for creating position independent code, especially within Linux "shared objects."

You will want to learn more about PLT and GOT, there are references available on the web, for example people.redhat.com/drepper/dsohowto.pdf, and the System V ABI

However, I would like to take the discussion up a level and ask you to provide an example of a function prototype for which you want to provide an inline asm version. I'd be surprised if you really need to be mucking around with GOT and PLT in the inline asm.

Regards.

srimks · ‎05-14-2009

Quoting - miblower

Hi srimks,
To answer your question directly, I would suggest that you start by reading the assembly code which is created by the compiler (vs. the disassembly of an object file.) e.g. "icc -S foo.c" means "put the assembly code in the file named foo.s". The same switch works for gcc. When you look at the assembly code you'll see the assembler syntax for referring to GOT and PLT. GOT and PLT are used for creating position independent code, especially within Linux "shared objects."

You will want to learn more about PLT and GOT, there are references available on the web, for example people.redhat.com/drepper/dsohowto.pdf, and the System V ABI

However, I would like to take the discussion up a level and ask you to provide an example of a function prototype for which you want to provide an inline asm version. I'd be surprised if you really need to be mucking around with GOT and PLT in the inline asm.

Regards.

Assembly(.s) doesn't reference to PLT nor GOT as both operation happens during linking time.

I compared both assembly & disassembly instructions code, this comparision doesn't makes any sense as creation of disassembly happens from executable when Loader has done all relocation for the object file but we don't see these things in assembly file because normal C/C++ file is converted into assembly file by compiler before relocation happens, certainly becoz of this reason the reference to GOT & PLT with assembly file will not happen.

Now, if I see the assembly file for call made for sin, cos operation, it's simply -
---
..B1.3: # Preds ..B1.10 ..B1.2
..LN7:
.loc 1 72
movsd 208(%rsp,%r12,8), %xmm0 #72.30
movsd _2il0floatpacket.1(%rip), %xmm1 #72.30
call fmod #72.30
# LOE rbx rbp r12 r13 r14 r15 xmm0
..B1.15: # Preds ..B1.3
movsd %xmm0, 80(%rsp) #72.30
# LOE rbx rbp r12 r13 r14 r15
..B1.4: # Preds ..B1.15
..LN9:
movsd 80(%rsp), %xmm0 #72.15
call sin #72.15
# LOE rbx rbp r12 r13 r14 r15 xmm0
..B1.16: # Preds ..B1.4
movaps %xmm0, %xmm9 #72.15
# LOE rbx rbp r12 r13 r14 r15 xmm9
..B1.5: # Preds ..B1.16
..LN11:
.loc 1 73
movsd 80(%rsp), %xmm0 #73.25
movsd %xmm9, 88(%rsp) #73.25
call cos #73.25
# LOE rbx rbp r12 r13 r14 r15 xmm0
..B1.17: # Preds ..B1.5
movsd 88(%rsp), %xmm9 #
# LOE rbx rbp r12 r13 r14 r15 xmm0 xmm9
..B1.6: # Preds ..B1.17
..LN13:
movsd _2il0floatpacket.2(%rip), %xmm15 #73.21
..LN15:
.loc 1 79
movslq (%r13), %r10 #79.20
..LN17:
.loc 1 84
movss (%rbx,%r15), %xmm7 #84.30
cvtps2pd %xmm7, %xmm13 #84.30
..LN19:
.loc 1 87
movss 4(%rbx,%r15), %xmm4 #87.30
..LN21:
.loc 1 90
movss 8(%rbx,%r15), %xmm5 #90.30
cvtps2pd %xmm5, %xmm10 #90.30
..LN23:
.loc 1 73
subsd %xmm0, %xmm15 #73.21
....
....
----

which is very different from disassembly as mentioned above. Here, I don't see any call sub-routine both for sin & cos operation.

The original code for assembly is as -
--
for (n = 0; n < ntor; n++) { /* "n"-th Torsion */ L# 71
s = sin(this_tor = ModRad(now.tor)); L# 72
o = 1. - (c = cos(this_tor)); L# 73
...
....
atmmmm = tlist[0]; L# 79
crdapple = (double)crd[atmnum];
crdapple = (double)crd[atmnum];
crdapple = (double)crd[atmnum];

sv = s * (vni = v); L# 84
k = (ov = o * vni) * vni + c;

sv = s * (vni = v); L# 87
k = (ov = o * vni) * vni + c;

sv = s * (vni = v); L# 90
--

~BR

Melanie_B_Intel · ‎05-15-2009

You're right, creation of the PLT is done by the loader. There is a cozy relationship between the compiler/assembler/linker/loader -- the contents of the PLT is implied by what kinds of relocations the linker/loader finds in the constituent object modules.

The relocation on the call statements can't be seen -- but if you run the objdump with --reloc you'll see that the relocation type is R_386_PLT32. The linker gathers all those kinds of relocations together and puts them into PLT.

On the other hand, I'd be careful about putting "call" statements inside inline assembly. The compiler can do special optimizations on "leaf" procedures -- if the compiler doesn't look inside the inline assembly itmight think that the function that uses the inline assembly is a leaf procedure. I recommend you learn a lot about how inline assembly works in your compiler.

Here are some experiments I did

#include
double a, b, c, d, e, f;

void test(){

int n;
for (n=0; n<10; n++) {
a = sin( fmod(e, b ));
d = 1.0 - cos( f);
}}
Intel C++ Compiler for Linux
Inrinsics Reference

which is available for public downloadgt;

635 gcc -c -fpic test.c
636 gcc -S -fpic test.c

objdump -D --reloc test.o > test.objdump

Now you can look at the assembly file (test.s) and the relocation types (test.objdump). I used the -fpic switch so you could see some of the assembly syntax that can be used to generate certain kinds of relocations

I personally am running out of time so this will be my last post for awhile. I hope some of my response has been helpful. Good luck!

BR

P.S. The so-called "Intel Intrinsics" are available on gcc and icc compilers. I hope you investigate those as well, to see if they could solve your problems--instead of writing your own inline asm. e.g. #include
You can find the documentation at

Intel C++ Compiler for Linux
Inrinsics Reference

which is available for public download