Intel® C++ Compiler
Community support and assistance for creating C++ code that runs on platforms based on Intel® processors.

ICC code much slower than GCC code

peter_silie
Beginner
1,991 Views

Hi, I compiled the following program with ICC 10.1 (20080801) using options '-O3 -axT' and with GCC 4.3.1 using options '-O3 -mtune=generic'. Surprisingly, the GCC code runs much faster (6.6s) than the ICC code (16.3s) on a Xeon @ 2.8 GHz. Any ideas where this difference comes from? Thanks a lot.

#include

int main() {
int src[XRES][YRES], dst[XRES][YRES];

for(int i = 0; i < 100; i++)
for(int x = 0; x < XRES; x++)
for(int y = 0; y < YRES; y++) {
int sum(0), count(0);
for(int xb = std::max(x-(MASK/2),0); xb for(int yb = std::max(y-(MASK/2),0); yb sum += src[xb][yb];
count++;
}
dst = sum/count;
}

return(dst[0][0]);
}

0 Kudos
21 Replies
TimP
Honored Contributor III
1,881 Views
You could spare us some guessing, if you would tell us your compile command. Assuming I guessed your intention well enough, setting -vec-report2 gives me the remark "unsupported loop structure," Moving the evaluation of std:min() into your initialization section, and cutting back to icpc -O1, I get similar timings for icpc and g++. Note that g++ would not attempt to vectorize a sum reduction, even with float or double operands, until you set -ffast-math (or -fassociative-math, if that works in your version).
0 Kudos
peter_silie
Beginner
1,881 Views
Here are the compile commands and the corresponding output (using -ffast-math/-fassociative-math doesn't make a significant difference in the run time of GCC code). Any other options I should use to make ICC code as fast GCC code?

g++ -Wall -O3 -mtune=generic -o test.gcc test.cpp
icpc -w=1 -O3 -axT -o test.icc test.cpp

time ./test.gcc

real 0m6.658s
user 0m6.648s
sys 0m0.008s

time ./test.icc

real 0m16.318s
user 0m16.305s
sys 0m0.012s

0 Kudos
John_O_Intel
Employee
1,881 Views


Here are the compile commands and the corresponding output (using -ffast-math/-fassociative-math doesn't make a significant difference in the run time of GCC code). Any other options I should use to make ICC code as fast GCC code?

g++ -Wall -O3 -mtune=generic -o test.gcc test.cpp
icpc -w=1 -O3 -axT -o test.icc test.cpp

time ./test.gcc

real 0m6.658s
user 0m6.648s
sys 0m0.008s

time ./test.icc

real 0m16.318s
user 0m16.305s
sys 0m0.012s



Hi,

Changing the loop logic allows the loop to be vectorized by icc 10.1.018, which could improve run-time - see below. When trying to optimize a loop it's always a good idea to simplify the loop as much as possible. The original loop logic has a function call, so the compiler will need to figure out this doesn't cause side effects in the loop, and icc isn't able to do that currently. Compiling with -vec-report2 will give 1 line summary if loop was vectorized, -vec-report3 can give additional details. Also, you compiled with -axT, are you running on a Core 2 processor ? If not, you will run the generic code path. I'd recommend building with -xT & run on Core 2 processor (or use whatever processor specific switch for your machine).

What are the values of XRES & YRES that you use ? What compiler version are you running (icc -V, gcc -v), and what type of machine (/proc/cpuinfo) ?

int main() {
int src[XRES][YRES], dst[XRES][YRES];

for(int i = 0; i < 100; i++)
for(int x = 0; x < XRES; x++)
for(int y = 0; y < YRES; y++) {
int sum(0), count(0);
for(int xb = std::max(x-(MASK/2),0); xb <= std::min(x+(MASK/2),XRES-1); xb++) {
int y0=std::max(y-(MASK/2),0);
int ym=std::min(y+(MASK/2),YRES-1);
for(int yb = y0; yb <= ym; yb++) {
// for(int yb = std::max(y-(MASK/2),0); yb <= std::min(y+(MASK/2),YRES-1); yb++) {
sum += src[xb][yb];
count++;
}
}
dst = sum/count;
}

return(dst[0][0]);
}
Regards,

JohnO
0 Kudos
peter_silie
Beginner
1,881 Views

Hi,

thanks for the information and sorry, I forgot to mention the values of the constants: XRES=1024, YRES=768, MASK=9. Changing the loop logic reduces the run time from 16 to 14 seconds (GCC code: approx. 7s). Compiling with -xT instead of -axT doesn't affect the run time on a quad-core Xeon processor @ 2.8 GHz (see below).

Regards

Output for icc -V and gcc -v:

Intel C Compiler for applications running on Intel 64, Version 10.1 Build 20080801 Package ID: l_cc_p_10.1.018

Using built-in specs.
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Debian 4.3.1-9' --with-bugurl=file:///usr/share/doc/gcc-4.3/README.Bugs --enable-languages=c,c++,fortran,objc,obj-c++ --prefix=/usr --enable-shared --with-system-zlib --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --enable-nls --with-gxx-include-dir=/usr/include/c++/4.3 --program-suffix=-4.3 --enable-clocale=gnu --enable-libstdcxx-debug --enable-objc-gc --enable-mpfr --enable-cld --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 4.3.1 (Debian 4.3.1-9)

/proc/cpuinfo (only first core):

processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 23
model name : Intel Xeon CPU E5440 @ 2.83GHz
stepping : 6
cpu MHz : 2826.256
cache size : 6144 KB
physical id : 0
siblings : 4
core id : 0
cpu cores : 4
fpu : yes
fpu_exception : yes
cpuid level : 10
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht t
m pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good pni monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr dca sse4_1 lahf_lm
bogomips : 5656.55
clflush size : 64
cache_alignment : 64
address sizes : 38 bits physical, 48 bits virtual
power management:

0 Kudos
John_O_Intel
Employee
1,881 Views
Quoting - peter.silie

Hi,

thanks for the information and sorry, I forgot to mention the values of the constants: XRES=1024, YRES=768, MASK=9. Changing the loop logic reduces the run time from 16 to 14 seconds (GCC code: approx. 7s). Compiling with -xT instead of -axT doesn't affect the run time on a quad-core Xeon processor @ 2.8 GHz (see below).

Regards

Output for icc -V and gcc -v:

Intel C Compiler for applications running on Intel 64, Version 10.1 Build 20080801 Package ID: l_cc_p_10.1.018

Using built-in specs.
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Debian 4.3.1-9' --with-bugurl=file:///usr/share/doc/gcc-4.3/README.Bugs --enable-languages=c,c++,fortran,objc,obj-c++ --prefix=/usr --enable-shared --with-system-zlib --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --enable-nls --with-gxx-include-dir=/usr/include/c++/4.3 --program-suffix=-4.3 --enable-clocale=gnu --enable-libstdcxx-debug --enable-objc-gc --enable-mpfr --enable-cld --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 4.3.1 (Debian 4.3.1-9)

/proc/cpuinfo (only first core):

processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 23
model name : Intel Xeon CPU E5440 @ 2.83GHz
stepping : 6
cpu MHz : 2826.256
cache size : 6144 KB
physical id : 0
siblings : 4
core id : 0
cpu cores : 4
fpu : yes
fpu_exception : yes
cpuid level : 10
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht t
m pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good pni monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr dca sse4_1 lahf_lm
bogomips : 5656.55
clflush size : 64
cache_alignment : 64
address sizes : 38 bits physical, 48 bits virtual
power management:

Thanks for the info. Running on 45nm Harpertown, I see similar behavior as you, gcc 4.3 generating faster code than icc 10.1, while icc is faster than gcc 4.2. I suggest you submit a performance issue to http://premier.intel.com

I improve icc performance by using profile guide optimization (need to build w/ -prof-gen, run app, build -prof-use)

[jjoneill@dpd22 forum]$ icpc -O3 -xT 60714.cpp -ipo -prof-use -V
Intel C++ Compiler for applications running on Intel 64, Version 10.1 Build 20080801
$ time ./a.out

real 0m9.053s
user 0m9.045s
sys 0m0.005s

gcc version 4.3.0 (GCC)
$ g++ -O3 60714.cpp
$ time ./a.out
real 0m6.271s
user 0m6.258s
sys 0m0.008s

I see gcc 4.2 is slower:

$ /opt/spdtools/compiler/ia32e/gcc-4.2.1/bin/g++ -O3 60714.cpp
$ time ./a.out

real 0m12.686s
user 0m12.677s
sys 0m0.003s

0 Kudos
gordan
Beginner
1,881 Views

Just out of interest, have you tried with the 9.x compiler? I've seen a number of cases a while back where 10.x didn't vectorize a lot of things that 9.x did.

0 Kudos
TimP
Honored Contributor III
1,881 Views

Just out of interest, have you tried with the 9.x compiler? I've seen a number of cases a while back where 10.x didn't vectorize a lot of things that 9.x did.

11.0 restored some useful vectorization which was removed in 10.0. The one exception I know of is vectorization by peeling off 1 or 2 iterations from the end of a loop, an optimization which gcc performs. If you have other cases which you consider important, why not document them?

0 Kudos
TimP
Honored Contributor III
1,881 Views
Quoting - tim18

11.0 restored some useful vectorization which was removed in 10.0. The one exception I know of is vectorization by peeling off 1 or 2 iterations from the end of a loop, an optimization which gcc performs. If you have other cases which you consider important, why not document them?

Source unrolled loops were vectorized for SSE2 after re-rolling in 9.1, where more recent compilers require newer instruction set options to vectorize without re-rolling.

0 Kudos
TimP
Honored Contributor III
1,881 Views
Quoting - tim18

Source unrolled loops were vectorized for SSE2 after re-rolling in 9.1, where more recent compilers require newer instruction set options to vectorize without re-rolling.

Re-rolling and peeling vectorization, such as was working in 9.1, appears to be under consideration for restoration after 11.0.

0 Kudos
peter_silie
Beginner
1,881 Views
Quoting - John O (Intel)

I suggest you submit a performance issue to http://premier.intel.com

I submitted an issue (519682) to http://premier.intel.com

0 Kudos
Feilong_H_Intel
Employee
1,881 Views
Engineering team looked into this issue. Here is the measurement result they got.
9.41s for GCC
13.55s for ICC 11.0
10.13s for ICC mainline
However if we compile the program at -O1 with ICC 11.0, the loop is not vectorized. It gives the best time of 7.47 seconds. Hence, this is a vectorization tuning issue.
Engineering team decided not to fix this issue in 11.0 and 10.x because the optimization improvements in the mainline compiler are small and scattered. I'm afraid that you will have to wait until the next major version of the compiler is available for download. Please let me know if you have further question.

Feilong

0 Kudos
TimP
Honored Contributor III
1,881 Views

A case of over-aggressive optimization of sum reduction might not be noticed, as neither gcc nor icc attempt to vectorize those cases unless gcc -ffast-math (icc -fp-model fast) are set, and those are not standard compliant. The main point is this case showed the need to optimize source by moving the expression within the loop termination expression out of the loop, an optimization which neither gcc nor icc perform consistently, although gcc did, and icc 11.x may do, in this case.

0 Kudos
aazue
New Contributor I
1,881 Views
Quoting - peter.silie
Hi, I compiled the following program with ICC 10.1 (20080801) using options '-O3 -axT' and with GCC 4.3.1 using options '-O3 -mtune=generic'. Surprisingly, the GCC code runs much faster (6.6s) than the ICC code (16.3s) on a Xeon @ 2.8 GHz. Any ideas where this difference comes from? Thanks a lot.

#include

int main() {
int src[XRES][YRES], dst[XRES][YRES];

for(int i = 0; i < 100; i++)
for(int x = 0; x < XRES; x++)
for(int y = 0; y < YRES; y++) {
int sum(0), count(0);
for(int xb = std::max(x-(MASK/2),0); xb <= std::min(x+(MASK/2),XRES-1); xb++)
for(int yb = std::max(y-(MASK/2),0); yb <= std::min(y+(MASK/2),YRES-1); yb++) {
sum += src[xb][yb];
count++;
}
dst = sum/count;
}

return(dst[0][0]);
}

Hi to all
This type discussion is well but subject can make doubts about performance product
I have compile 2 exactly same source (Web management tools)
ICC and other GCC
I have some heigh level programing hard and some task have decrease divided 2 with ICC and i have observed other task where GCC is better.
I have add two link access in precedent discuss
(BUILD APACHE 2.2.10,POSTGRESQL 8.2.10 ,CGI PROGRAMMING COMPILER INTEL ICC (11) O/S LINUX OPENSUSE 11 AND DEBIAN 4.0 R2)
with that you can really evaluate benefit GCC and benefit ICC
Are not with single loop or viewing to screen 'hello world' that you can evaluate or make opinion product..
Also to use -O3 G++ are not possible I think that you must wait 2 days to about compile big project....
I like GNU compiler over all others ,CROSS side with you can make AIX PPC binary Sparc Sun &.....
and is very well product..
I think that create true and just challenge is benefit to two product.
I want show if boost results with product can be having options for specific type processors. same Core quad and new
I have make several programs with thread programming that i have not really see extraordinary benefit result.
I hope that now INTEL with advantage constructor processors can open voice to understand and resulting true.

I put two links in my previous exchange to assess
both compilers but I see that nobody are try? fear?
Or i must understand that with oriented Web side you have estimate by default have not an value that can having interest watched ?
Unix Penguin side ?
I think that is me that can have fear to open same public access.

http://82.127.82.195:8082/kalachniweb.html is INTEL ICC compiler (APACHE 2,POSTGRESQL,WARTHREAD)
for not confuse ICC Intel is wrote in background (8082)

http://82.127.82.195:8080/kalachniweb.html is GNU CGG Compiler (APACHE 2,POSTGRESQL,WARTHREAD)


I work also Microsoft side but actually with Vista management , best product but no success, result bad management.
Fault users have no choice and must buy new machine have processor 64 bit with operating system 32 bit
installed by default, i not understand where are arguments for changing existing XP. Wrong oriented
product management, have result that you can not sale your products updated 64 bit or update all of network machines.
You have no choice that you share fault Microsoft..

To think that this type links is an standard site conventional is false. You can observe level + 25
years experience programming. (Same you can also compare
small part of Warthread C/C++ language with engine database Postgresql and (Oracle Application Express) Java typed (interfaced subject) only i have also OCCI lib backend
Oracle AND backend IBM DB2)) just in this gateway access point
Postgresql is well with several bases as sub instanced.
(soft to small computer)

C/C++ have great part with new asynchronous programing Intranet. C/C++ must be popular and not reserved only to hard level

I think that open this link can be benefit
to host some programmers to use language C/C++.
For sale update customers computers i must show benefit as to change core 2 quad or new. Are not with time loop that you can make new deals..
I have think to add in futures several type computer(CORE 2 QUAD NEW PENTIUM PRO,ATOM etc.. also differencing 32 & 64 bits
Remarks
(About Atom (U100)that i have buy, I am furious i discover as 32 bit type but is well product very satisfaction
i must just also buy new eyeglasses ..)

With same result never access, i must understand that want share experience is stupid and make several others tasks that i having in project..

This communication have not object critical side is just to add discovering popular easy level C/C++ language and
also share communication Intel group with hope an result new objects deals to make money .

You answering with help problems sources ,it well , I thank to you and congratulation for your work.
Best regards to all. ( .......Tomorrow i use ICC for compile GCC ,for result never questions about performance of two products ........ )
Seriously, is there people here who compile Firefox or SeaMonkey with ICC ??


(For Intel Web site management) you having the score points is wrote over some part question (Gecko graphic engine only) .

0 Kudos
srimks
New Contributor II
1,881 Views
Quoting - peter.silie
Here are the compile commands and the corresponding output (using -ffast-math/-fassociative-math doesn't make a significant difference in the run time of GCC code). Any other options I should use to make ICC code as fast GCC code?

g++ -Wall -O3 -mtune=generic -o test.gcc test.cpp
icpc -w=1 -O3 -axT -o test.icc test.cpp

time ./test.gcc

real 0m6.658s
user 0m6.648s
sys 0m0.008s

time ./test.icc

real 0m16.318s
user 0m16.305s
sys 0m0.012s

I hope by now you must have figured out the problem and how to decide which compiler can support the level of vectorization and when vectorizations becomes a limitations for a compiler.

Try using GCC-v4.4 and check the same with Intel-v11.0.

Could you check the linking library"ldd [executable]" used while building the executable both with GCC & Intel Comp.

I also saw Intel Compiler generating a big size(process map) executable than GCC(v-4.1) for simple C hello world program. Hopefully, I 'll explore more with Intel C++ Compilers(both v-10.0 & v-11.0) and crystalling differentiate the areas between both compilers w.r.t performances.

Thanks for putting this question to forum.

~BR

0 Kudos
aazue
New Contributor I
1,881 Views
Quoting - srimks
Quoting - peter.silie
Here are the compile commands and the corresponding output (using -ffast-math/-fassociative-math doesn't make a significant difference in the run time of GCC code). Any other options I should use to make ICC code as fast GCC code?

g++ -Wall -O3 -mtune=generic -o test.gcc test.cpp
icpc -w=1 -O3 -axT -o test.icc test.cpp

time ./test.gcc

real 0m6.658s
user 0m6.648s
sys 0m0.008s

time ./test.icc

real 0m16.318s
user 0m16.305s
sys 0m0.012s

I hope by now you must have figured out the problem and how to decide which compiler can support the level of vectorization and when vectorizations becomes a limitations for a compiler.

Try using GCC-v4.4 and check the same with Intel-v11.0.

Could you check the linking library"ldd [executable]" used while building the executable both with GCC & Intel Comp.

I also saw Intel Compiler generating a big size(process map) executable than GCC(v-4.1) for simple C hello world program. Hopefully, I 'll explore more with Intel C++ Compilers(both v-10.0 & v-11.0) and crystalling differentiate the areas between both compilers w.r.t performances.

Thanks for putting this question to forum.

~BR

Hi


( in repertory (bin postgresql) compiled with ICC) (LISTEN :8082) OPENSUSE 11

linux-de4c:/usr/local/pgsql/bin # ldd /usr/local/pgsql/bin/clusterdb

linux-gate.so.1 => (0xffffe000)
libimf.so => /usr/local/pgsql/lib/libimf.so (0xb7db3000)
libm.so.6 => /lib/libm.so.6 (0xb7d79000)
libpq.so.5 => /usr/local/pgsql/lib/libpq.so.5 (0xb7d5c000)
libz.so.1 => /lib/libz.so.1 (0xb7d48000)
libreadline.so.5 => /lib/libreadline.so.5 (0xb7d15000)
libcrypt.so.1 => /lib/libcrypt.so.1 (0xb7cde000)
libgcc_s.so.1 => /lib/libgcc_s.so.1 (0xb7cd0000)
libc.so.6 => /lib/libc.so.6 (0xb7b8d000)
libdl.so.2 => /lib/libdl.so.2 (0xb7b89000)
/lib/ld-linux.so.2 (0xb7fd3000)
libsvml.so => /usr/local/pgsql/lib/libsvml.so (0xb7ab5000)
libintlc.so.5 => /usr/local/pgsql/lib/libintlc.so.5 (0xb7a72000)
libncurses.so.5 => /lib/libncurses.so.5 (0xb7a3a000)

(In repertory (bin postgresql) compiled GCC) (LISTEN :8080) DEBIAN 4.00 r2

debian:/usr/local/pgsql/bin# ldd /usr/local/pgsql/bin/clusterdb
linux-gate.so.1 => (0xffffe000)
libpq.so.5 => /usr/local/pgsql/lib/libpq.so.5 (0xb7f25000)
libz.so.1 => /usr/lib/libz.so.1 (0xb7f03000)
libreadline.so.5 => /lib/libreadline.so.5 (0xb7ed3000)
libcrypt.so.1 => /lib/tls/i686/cmov/libcrypt.so.1 (0xb7ea5000)
libdl.so.2 => /lib/tls/i686/cmov/libdl.so.2 (0xb7ea1000)
libm.so.6 => /lib/tls/i686/cmov/libm.so.6 (0xb7e7c000)
libc.so.6 => /lib/tls/i686/cmov/libc.so.6 (0xb7d4b000)
libncurses.so.5 => /lib/libncurses.so.5 (0xb7d09000)
/lib/ld-linux.so.2 (0xb7f40000)

best regards

0 Kudos
srimks
New Contributor II
1,881 Views
Quoting - bustaf

Hi


( in repertory (bin postgresql) compiled with ICC) (LISTEN :8082) OPENSUSE 11

linux-de4c:/usr/local/pgsql/bin # ldd /usr/local/pgsql/bin/clusterdb

linux-gate.so.1 => (0xffffe000)
libimf.so => /usr/local/pgsql/lib/libimf.so (0xb7db3000)
libm.so.6 => /lib/libm.so.6 (0xb7d79000)
libpq.so.5 => /usr/local/pgsql/lib/libpq.so.5 (0xb7d5c000)
libz.so.1 => /lib/libz.so.1 (0xb7d48000)
libreadline.so.5 => /lib/libreadline.so.5 (0xb7d15000)
libcrypt.so.1 => /lib/libcrypt.so.1 (0xb7cde000)
libgcc_s.so.1 => /lib/libgcc_s.so.1 (0xb7cd0000)
libc.so.6 => /lib/libc.so.6 (0xb7b8d000)
libdl.so.2 => /lib/libdl.so.2 (0xb7b89000)
/lib/ld-linux.so.2 (0xb7fd3000)
libsvml.so => /usr/local/pgsql/lib/libsvml.so (0xb7ab5000)
libintlc.so.5 => /usr/local/pgsql/lib/libintlc.so.5 (0xb7a72000)
libncurses.so.5 => /lib/libncurses.so.5 (0xb7a3a000)

(In repertory (bin postgresql) compiled GCC) (LISTEN :8080) DEBIAN 4.00 r2

debian:/usr/local/pgsql/bin# ldd /usr/local/pgsql/bin/clusterdb
linux-gate.so.1 => (0xffffe000)
libpq.so.5 => /usr/local/pgsql/lib/libpq.so.5 (0xb7f25000)
libz.so.1 => /usr/lib/libz.so.1 (0xb7f03000)
libreadline.so.5 => /lib/libreadline.so.5 (0xb7ed3000)
libcrypt.so.1 => /lib/tls/i686/cmov/libcrypt.so.1 (0xb7ea5000)
libdl.so.2 => /lib/tls/i686/cmov/libdl.so.2 (0xb7ea1000)
libm.so.6 => /lib/tls/i686/cmov/libm.so.6 (0xb7e7c000)
libc.so.6 => /lib/tls/i686/cmov/libc.so.6 (0xb7d4b000)
libncurses.so.5 => /lib/libncurses.so.5 (0xb7d09000)
/lib/ld-linux.so.2 (0xb7f40000)

best regards


Hi.

Could you try again the above observations in two seperate console. One console purely with environment variables set for GCC & another for Intel Compilers.

Also, could you add"-ffunction-sections" alongwith GCC compile command, try the same FLAG with Intel Compiler. Check the difference then?

Sorry, as being holiday I couldn't replicate the same today.

~BR

0 Kudos
aazue
New Contributor I
1,881 Views
Quoting - srimks
Quoting - bustaf

Hi


( in repertory (bin postgresql) compiled with ICC) (LISTEN :8082) OPENSUSE 11

linux-de4c:/usr/local/pgsql/bin # ldd /usr/local/pgsql/bin/clusterdb

linux-gate.so.1 => (0xffffe000)
libimf.so => /usr/local/pgsql/lib/libimf.so (0xb7db3000)
libm.so.6 => /lib/libm.so.6 (0xb7d79000)
libpq.so.5 => /usr/local/pgsql/lib/libpq.so.5 (0xb7d5c000)
libz.so.1 => /lib/libz.so.1 (0xb7d48000)
libreadline.so.5 => /lib/libreadline.so.5 (0xb7d15000)
libcrypt.so.1 => /lib/libcrypt.so.1 (0xb7cde000)
libgcc_s.so.1 => /lib/libgcc_s.so.1 (0xb7cd0000)
libc.so.6 => /lib/libc.so.6 (0xb7b8d000)
libdl.so.2 => /lib/libdl.so.2 (0xb7b89000)
/lib/ld-linux.so.2 (0xb7fd3000)
libsvml.so => /usr/local/pgsql/lib/libsvml.so (0xb7ab5000)
libintlc.so.5 => /usr/local/pgsql/lib/libintlc.so.5 (0xb7a72000)
libncurses.so.5 => /lib/libncurses.so.5 (0xb7a3a000)

(In repertory (bin postgresql) compiled GCC) (LISTEN :8080) DEBIAN 4.00 r2

debian:/usr/local/pgsql/bin# ldd /usr/local/pgsql/bin/clusterdb
linux-gate.so.1 => (0xffffe000)
libpq.so.5 => /usr/local/pgsql/lib/libpq.so.5 (0xb7f25000)
libz.so.1 => /usr/lib/libz.so.1 (0xb7f03000)
libreadline.so.5 => /lib/libreadline.so.5 (0xb7ed3000)
libcrypt.so.1 => /lib/tls/i686/cmov/libcrypt.so.1 (0xb7ea5000)
libdl.so.2 => /lib/tls/i686/cmov/libdl.so.2 (0xb7ea1000)
libm.so.6 => /lib/tls/i686/cmov/libm.so.6 (0xb7e7c000)
libc.so.6 => /lib/tls/i686/cmov/libc.so.6 (0xb7d4b000)
libncurses.so.5 => /lib/libncurses.so.5 (0xb7d09000)
/lib/ld-linux.so.2 (0xb7f40000)

best regards


Hi.

Could you try again the above observations in two seperate console. One console purely with environment variables set for GCC & another for Intel Compilers.

Also, could you add "-ffunction-sections" alongwith GCC compile command, try the same FLAG with Intel Compiler. Check the difference then?

Sorry, as being holiday I couldn't replicate the same today.

~BR

Hi
I have wrote ldd command under each two separate machine,with original no changed (environment)
also for GCC 4.4.1
Have link where can bee download binary?
this two small machines have only 1 processor, test can be take long time to compile source 4.4.1 unable to use flag -J to make process in parallel I have machine 8 processors, just is problem noise PSU
Can be started only without wife in house.
My wife can using the rolling pin easy .. with or without loop vectorized options ....
Best regards

0 Kudos
aazue
New Contributor I
1,881 Views
Quoting - bustaf
Quoting - srimks
Quoting - bustaf

Hi


( in repertory (bin postgresql) compiled with ICC) (LISTEN :8082) OPENSUSE 11

linux-de4c:/usr/local/pgsql/bin # ldd /usr/local/pgsql/bin/clusterdb

linux-gate.so.1 => (0xffffe000)
libimf.so => /usr/local/pgsql/lib/libimf.so (0xb7db3000)
libm.so.6 => /lib/libm.so.6 (0xb7d79000)
libpq.so.5 => /usr/local/pgsql/lib/libpq.so.5 (0xb7d5c000)
libz.so.1 => /lib/libz.so.1 (0xb7d48000)
libreadline.so.5 => /lib/libreadline.so.5 (0xb7d15000)
libcrypt.so.1 => /lib/libcrypt.so.1 (0xb7cde000)
libgcc_s.so.1 => /lib/libgcc_s.so.1 (0xb7cd0000)
libc.so.6 => /lib/libc.so.6 (0xb7b8d000)
libdl.so.2 => /lib/libdl.so.2 (0xb7b89000)
/lib/ld-linux.so.2 (0xb7fd3000)
libsvml.so => /usr/local/pgsql/lib/libsvml.so (0xb7ab5000)
libintlc.so.5 => /usr/local/pgsql/lib/libintlc.so.5 (0xb7a72000)
libncurses.so.5 => /lib/libncurses.so.5 (0xb7a3a000)

(In repertory (bin postgresql) compiled GCC) (LISTEN :8080) DEBIAN 4.00 r2

debian:/usr/local/pgsql/bin# ldd /usr/local/pgsql/bin/clusterdb
linux-gate.so.1 => (0xffffe000)
libpq.so.5 => /usr/local/pgsql/lib/libpq.so.5 (0xb7f25000)
libz.so.1 => /usr/lib/libz.so.1 (0xb7f03000)
libreadline.so.5 => /lib/libreadline.so.5 (0xb7ed3000)
libcrypt.so.1 => /lib/tls/i686/cmov/libcrypt.so.1 (0xb7ea5000)
libdl.so.2 => /lib/tls/i686/cmov/libdl.so.2 (0xb7ea1000)
libm.so.6 => /lib/tls/i686/cmov/libm.so.6 (0xb7e7c000)
libc.so.6 => /lib/tls/i686/cmov/libc.so.6 (0xb7d4b000)
libncurses.so.5 => /lib/libncurses.so.5 (0xb7d09000)
/lib/ld-linux.so.2 (0xb7f40000)

best regards


Hi.

Could you try again the above observations in two seperate console. One console purely with environment variables set for GCC & another for Intel Compilers.

Also, could you add "-ffunction-sections" alongwith GCC compile command, try the same FLAG with Intel Compiler. Check the difference then?

Sorry, as being holiday I couldn't replicate the same today.

~BR

Hi
I have wrote ldd command under each two separate machine,with original no changed (environment)
also for GCC 4.4.1
Have link where can bee download binary?
this two small machines have only 1 processor, test can be take long time to compile source 4.4.1 unable to use flag -J to make process in parallel I have machine 8 processors, just is problem noise PSU
Can be started only without wife in house.
My wife can using the rolling pin easy .. with or without loop vectorized options ....
Best regards

Hi

Unable to compile source 4.4.X
debian 4.0 with default origin gcc.( version gcc 4.1.2 20061115 (prerelease)

? lib6 and some other lib must be updated before.

i have add success lib mpfr and gmp but result same false( libgcc at step link ) ??

I must instll new machine with Opensuse 11 that have gcc 4.3.1 default and add also Icc

same can be result true easy.

Best regard

0 Kudos
aazue
New Contributor I
1,881 Views
Quoting - bustaf
Quoting - bustaf
Quoting - srimks
Quoting - bustaf

Hi


( in repertory (bin postgresql) compiled with ICC) (LISTEN :8082) OPENSUSE 11

linux-de4c:/usr/local/pgsql/bin # ldd /usr/local/pgsql/bin/clusterdb

linux-gate.so.1 => (0xffffe000)
libimf.so => /usr/local/pgsql/lib/libimf.so (0xb7db3000)
libm.so.6 => /lib/libm.so.6 (0xb7d79000)
libpq.so.5 => /usr/local/pgsql/lib/libpq.so.5 (0xb7d5c000)
libz.so.1 => /lib/libz.so.1 (0xb7d48000)
libreadline.so.5 => /lib/libreadline.so.5 (0xb7d15000)
libcrypt.so.1 => /lib/libcrypt.so.1 (0xb7cde000)
libgcc_s.so.1 => /lib/libgcc_s.so.1 (0xb7cd0000)
libc.so.6 => /lib/libc.so.6 (0xb7b8d000)
libdl.so.2 => /lib/libdl.so.2 (0xb7b89000)
/lib/ld-linux.so.2 (0xb7fd3000)
libsvml.so => /usr/local/pgsql/lib/libsvml.so (0xb7ab5000)
libintlc.so.5 => /usr/local/pgsql/lib/libintlc.so.5 (0xb7a72000)
libncurses.so.5 => /lib/libncurses.so.5 (0xb7a3a000)

(In repertory (bin postgresql) compiled GCC) (LISTEN :8080) DEBIAN 4.00 r2

debian:/usr/local/pgsql/bin# ldd /usr/local/pgsql/bin/clusterdb
linux-gate.so.1 => (0xffffe000)
libpq.so.5 => /usr/local/pgsql/lib/libpq.so.5 (0xb7f25000)
libz.so.1 => /usr/lib/libz.so.1 (0xb7f03000)
libreadline.so.5 => /lib/libreadline.so.5 (0xb7ed3000)
libcrypt.so.1 => /lib/tls/i686/cmov/libcrypt.so.1 (0xb7ea5000)
libdl.so.2 => /lib/tls/i686/cmov/libdl.so.2 (0xb7ea1000)
libm.so.6 => /lib/tls/i686/cmov/libm.so.6 (0xb7e7c000)
libc.so.6 => /lib/tls/i686/cmov/libc.so.6 (0xb7d4b000)
libncurses.so.5 => /lib/libncurses.so.5 (0xb7d09000)
/lib/ld-linux.so.2 (0xb7f40000)

best regards


Hi.

Could you try again the above observations in two seperate console. One console purely with environment variables set for GCC & another for Intel Compilers.

Also, could you add "-ffunction-sections" alongwith GCC compile command, try the same FLAG with Intel Compiler. Check the difference then?

Sorry, as being holiday I couldn't replicate the same today.

~BR

Hi
I have wrote ldd command under each two separate machine,with original no changed (environment)
also for GCC 4.4.1
Have link where can bee download binary?
this two small machines have only 1 processor, test can be take long time to compile source 4.4.1 unable to use flag -J to make process in parallel I have machine 8 processors, just is problem noise PSU
Can be started only without wife in house.
My wife can using the rolling pin easy .. with or without loop vectorized options ....
Best regards

Hi

Unable to compile source 4.4.X
debian 4.0 with default origin gcc.( version gcc 4.1.2 20061115 (prerelease)

? lib6 and some other lib must be updated before.

i have add success lib mpfr and gmp but result same false( libgcc at step link ) ??

I must instll new machine with Opensuse 11 that have gcc 4.3.1 default and add also Icc

same can be result true easy.

Best regard

Rectification , Sorry to all friends community DEBIAN,original default gcc-4.1 debian 4.0 rc2 can be recompiled to gcc-4.3.1 and 4.4.0 20081212 no prblem library
just, must be removed fortran language (no time to resolv problem)

I have make two.

debian:/usr/bin# gcc-4.3.1 -v
Utilisation des specs internes.
Target: i486-linux-gnu
Configur avec: ./configure -v --enable-languages=c,c++,objc,obj-c++,treelang --prefix=/usr --enable-shared --with-system-zlib --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --enable-nls --program-suffix=-4.3.1 --enable-__cxa_atexit --enable-clocale=gnu --enable-libstdcxx-debug --enable-mpfr --with-tune=i686 --enable-checking=release i486-linux-gnu
Modle de thread: posix
gcc version 4.3.1 (GCC)


debian:/usr/bin# gcc-4.4-200812 -v
Utilisation des specs internes.
Target: i486-linux-gnu
Configur avec: ./configure -v --enable-languages=c,c++,objc,obj-c++--prefix=/usr --enable-shared --with-system-zlib --libexecdir=/usr/lib--without-included-gettext --enable-threads=posix --enable-nls--program-suffix=-4.4-200812 --enable-__cxa_atexit --enable-clocale=gnu--enable-libstdcxx-debug --enable-mpfr --with-tune=i686--enable-checking=release i486-linux-gnu
Modle de thread: posix
gcc version 4.4.0 20081212 (experimental) (GCC)


Best regards

0 Kudos
SergeyKostrov
Valued Contributor II
1,566 Views
A note to Intel software engineers: Something went wrong on IDZ and a very old 2008 thread was marked as New, however there are No any 2013 posts except for this one.
0 Kudos
Reply