Intel® C++ Compiler
Community support and assistance for creating C++ code that runs on platforms based on Intel® processors.

Address Optimization - Nehalem

srimks
New Contributor II
368 Views
Hello,

I have a piece of code as below to understand ICC-v11.0 "Address Optimization" on Nehalem -

The original code is -
--
#include

int a;
int b;
int c;

void f (void)
{
a = 3;
b = 5;
c = 7;
return;
}

int main()
{
f();
return 0;
}

--

Above has been transformed in below to refer global variables using pointers rather by constant address -

--
#include

int __t1[3]; /* global pool */
int *__t2 = &__t1[0]; /* pointer to global pool */

void f (void)
{
*__t2 = 3;
*(__t2 + 1) = 5;
*(__t2 + 2) = 7;
return;
}

int main()
{
f();
return 0;
}
--

The generated size for unmodified (constant address) & modified (pointer based) code both for "Intel Xeon CPU 5160 @ 3.00GHz processor" and Nehalem (Intel Xeon CPU X5560 @ 2.80GHz) are as -

(a) Intel Xeon CPU 5160 @ 3.00GHz
8023 (unmodified code) and 8076(modified code)

(b) Nehalem (Intel Xeon CPU X5560 @ 2.80GHz)
7972(unmodified code) and 8025(modified code)

I did checked the assembly for both the processors using ICC-v11.0 on Linux x86_64 machine for above code, it 100% same.

Query: Why reduction in size (~50 bytes) happens with Nehalem than Intel Xeon 5160 processor with above address optimization code?
Is any specific register as a pointer to the global variable pool being defined or loading a pointer made less expensive?

~BR
0 Kudos
3 Replies
RamaKishan_M_Intel
368 Views

I checked your examples and found the difference in size of executables generated for Intel Xeon (Harpertown) platform vs. the Intel Nehalem platform. Though the generated assembly is the same, if you take a look at the final code generated after linking with the dependency libraries, you will find that there is a difference in the assembly listing on Harpertown as compared to Nehalem. Please try using the "objdump" utility with "-D" option to get the DISASM dump. This difference could be dependent on the compiler code gen aspects for different platforms and the associated dependency libraries. Hope this helps clarify the difference.

thank you.

regards,

rama

0 Kudos
srimks
New Contributor II
368 Views

I checked your examples and found the difference in size of executables generated for Intel Xeon (Harpertown) platform vs. the Intel Nehalem platform. Though the generated assembly is the same, if you take a look at the final code generated after linking with the dependency libraries, you will find that there is a difference in the assembly listing on Harpertown as compared to Nehalem. Please try using the "objdump" utility with "-D" option to get the DISASM dump. This difference could be dependent on the compiler code gen aspects for different platforms and the associated dependency libraries. Hope this helps clarify the difference.

thank you.

regards,

rama

As qouted "This difference could be dependent on the compiler code gen aspects for different platforms and the associated dependency libraries.". I think this example code doesn't calls any libraries externally w.r.t unmodied and modified executables dependencies of libraries as -

A. Nehalem Intel Xeon CPU X5560 @ 2.80GHz processor -

(a) Unmodified code -

$ ldd address-optimization

libm.so.6 => /lib64/libm.so.6 (0x0000003816800000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x000000381a800000)
libc.so.6 => /lib64/libc.so.6 (0x0000003816400000)
libdl.so.2 => /lib64/libdl.so.2 (0x0000003816c00000)
/lib64/ld-linux-x86-64.so.2 (0x0000003815400000)

(b) Modified code -

$ ldd address-optimization-modified

libm.so.6 => /lib64/libm.so.6 (0x0000003816800000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x000000381a800000)
libc.so.6 => /lib64/libc.so.6 (0x0000003816400000)
libdl.so.2 => /lib64/libdl.so.2 (0x0000003816c00000)
/lib64/ld-linux-x86-64.so.2 (0x0000003815400000)

B. Intel Xeon CPU X5355 @ 2.66GHz Processor -

(c) Unmodified code (Size 8023) -

libm.so.6 => /lib64/tls/libm.so.6 (0x000000399e600000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00000039a0400000)
libc.so.6 => /lib64/tls/libc.so.6 (0x000000399e100000)
libdl.so.2 => /lib64/libdl.so.2 (0x000000399e400000)
/lib64/ld-linux-x86-64.so.2 (0x000000399df00000)

(d) Modified code (Size = 8076)

libm.so.6 => /lib64/tls/libm.so.6 (0x000000399e600000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00000039a0400000)
libc.so.6 => /lib64/tls/libc.so.6 (0x000000399e100000)
libdl.so.2 => /lib64/libdl.so.2 (0x000000399e400000)
/lib64/ld-linux-x86-64.so.2 (0x000000399df00000)

If we compare above (a) with (c) and (b) with (d), both are same.

I appreciate if further explanation can be extended with stating what additional hardware instructions are being used w.r.t Nehalem (Intel Xeon CPU X5560 @ 2.80GHz) which makes it to be better in size ( < 51 bytes) w.r.t ntel Xeon CPU X5355 @ 2.66GHz Processor for above piece of code.

Objdump: If I compare the objdump of unmodified code w.r.t both processor, I see "nop" has been replaced with "nopl" in NHM for "<__libc_start_main@plt-0x10>", the way "<__do_global_dtors_aux>" is called in NHM is different, data16 has not been referred with NHM in <__libc_csu_init> calls but has more nop, also <__libc_csu_fini> in NHM has fewer inst.

Finally, by going through the objdump for code on both processor, I don't see any extra instructions being added which benefits NHM to gain 51 bytes, please clarify?

~BR
0 Kudos
RamaKishan_M_Intel
368 Views
By dependecy linking,I meant static linking and not dynamic linking dependencies. So, please don't look at .SO dependencies when comparing the executable size.

On your second note, the Objdump output difference is what must be contributing the difference in the size of the executables on Harpertown vs. Nehalem.

thanks
-rama

0 Kudos
Reply