- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am trying to compile gsl 1.15 with intel 11.1
I tried the optimization stated in
http://software.intel.com/en-us/forums/showthread.php?t=75549
CFLAGS="-O2 -m64 -march=core2 -mtune=core2 -Wpointer-arith -fno-strict-aliasing "
but I found that the test fails in specfunc
make[2]: Entering directory `/my/path/gsl-1.15/specfunc'
/bin/sh: line 5: 11112 Illegal instruction (core dumped) ${dir}$tst
FAIL: test
==================
1 of 1 test failed
==================
any ideas?
I tried the optimization stated in
http://software.intel.com/en-us/forums/showthread.php?t=75549
CFLAGS="-O2 -m64 -march=core2 -mtune=core2 -Wpointer-arith -fno-strict-aliasing "
but I found that the test fails in specfunc
make[2]: Entering directory `/my/path/gsl-1.15/specfunc'
/bin/sh: line 5: 11112 Illegal instruction (core dumped) ${dir}$tst
FAIL: test
==================
1 of 1 test failed
==================
any ideas?
1 Solution
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
There's little point in setting -march or -mtune to core2 for x86_64, even in gcc. The defaults (compatibility with nocona and opteron) are fine. mtune is redundant, once march is set to the same value. Does icc work when those options are omitted?
For complex numbers, gsl uses local struct definitions, which are unlikely to be recognized by compilers as optimizable by -msse3.
Perhaps this unusual combination of options is misinterpreted by icc. -march=core2 would imply -mssse3.
It might be safer to add -msse3 to the option list rather than depending on various compilers to set architecture by -march=core2. Recent production core 2 CPUs support also -msse4.1, but early ones (like mine) don't. It should be safe to set -xssse3 if you want code which is compatible with all Core2 CPUs but not with AMD.
For complex numbers, gsl uses local struct definitions, which are unlikely to be recognized by compilers as optimizable by -msse3.
Perhaps this unusual combination of options is misinterpreted by icc. -march=core2 would imply -mssse3.
It might be safer to add -msse3 to the option list rather than depending on various compilers to set architecture by -march=core2. Recent production core 2 CPUs support also -msse4.1, but early ones (like mine) don't. It should be safe to set -xssse3 if you want code which is compatible with all Core2 CPUs but not with AMD.
Link Copied
7 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
what processors are you compiling & running the test on? Illegal instruction sounds like optimizing for a processor generation newer than the one you actually use.
Best regards,
Georg Zitzlsberger
what processors are you compiling & running the test on? Illegal instruction sounds like optimizing for a processor generation newer than the one you actually use.
Best regards,
Georg Zitzlsberger
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
There's little point in setting -march or -mtune to core2 for x86_64, even in gcc. The defaults (compatibility with nocona and opteron) are fine. mtune is redundant, once march is set to the same value. Does icc work when those options are omitted?
For complex numbers, gsl uses local struct definitions, which are unlikely to be recognized by compilers as optimizable by -msse3.
Perhaps this unusual combination of options is misinterpreted by icc. -march=core2 would imply -mssse3.
It might be safer to add -msse3 to the option list rather than depending on various compilers to set architecture by -march=core2. Recent production core 2 CPUs support also -msse4.1, but early ones (like mine) don't. It should be safe to set -xssse3 if you want code which is compatible with all Core2 CPUs but not with AMD.
For complex numbers, gsl uses local struct definitions, which are unlikely to be recognized by compilers as optimizable by -msse3.
Perhaps this unusual combination of options is misinterpreted by icc. -march=core2 would imply -mssse3.
It might be safer to add -msse3 to the option list rather than depending on various compilers to set architecture by -march=core2. Recent production core 2 CPUs support also -msse4.1, but early ones (like mine) don't. It should be safe to set -xssse3 if you want code which is compatible with all Core2 CPUs but not with AMD.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am compiling on
Intel Xeon CPU 3.00GHz
Intel Xeon CPU 3.00GHz
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I'm sorry, but that's not enough information. We've lots of generations of Intel Xeon CPUs with 3GHz. Some are far pre-core2 era.
On Linux* you can do a
$ cat /proc/cpuinfo
or look up your processor in our products database here:
http://ark.intel.com/
and send us the link.
Best regards,
Georg Zitzlsberger
I'm sorry, but that's not enough information. We've lots of generations of Intel Xeon CPUs with 3GHz. Some are far pre-core2 era.
On Linux* you can do a
$ cat /proc/cpuinfo
or look up your processor in our products database here:
http://ark.intel.com/
and send us the link.
Best regards,
Georg Zitzlsberger
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
processor : 0
vendor_id : GenuineIntel
cpu family : 15
model : 4
model name : Intel Xeon CPU 3.00GHz
stepping : 3
cpu MHz : 3000.000
cache size : 2048 KB
physical id : 0
siblings : 1
core id : 0
cpu cores : 1
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 5
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc pebs bts pni dtes64 monitor ds_cpl cid cx16 xtpr
bogomips : 5985.01
clflush size : 64
cache_alignment : 128
address sizes : 36 bits physical, 48 bits virtual
power management:
processor : 1
vendor_id : GenuineIntel
cpu family : 15
model : 4
model name : Intel Xeon CPU 3.00GHz
stepping : 3
cpu MHz : 3000.000
cache size : 2048 KB
physical id : 3
siblings : 1
core id : 0
cpu cores : 1
apicid : 6
initial apicid : 6
fpu : yes
fpu_exception : yes
cpuid level : 5
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc pebs bts pni dtes64 monitor ds_cpl cid cx16 xtpr
bogomips : 5985.25
clflush size : 64
cache_alignment : 128
address sizes : 36 bits physical, 48 bits virtual
power management:
vendor_id : GenuineIntel
cpu family : 15
model : 4
model name : Intel Xeon CPU 3.00GHz
stepping : 3
cpu MHz : 3000.000
cache size : 2048 KB
physical id : 0
siblings : 1
core id : 0
cpu cores : 1
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 5
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc pebs bts pni dtes64 monitor ds_cpl cid cx16 xtpr
bogomips : 5985.01
clflush size : 64
cache_alignment : 128
address sizes : 36 bits physical, 48 bits virtual
power management:
processor : 1
vendor_id : GenuineIntel
cpu family : 15
model : 4
model name : Intel Xeon CPU 3.00GHz
stepping : 3
cpu MHz : 3000.000
cache size : 2048 KB
physical id : 3
siblings : 1
core id : 0
cpu cores : 1
apicid : 6
initial apicid : 6
fpu : yes
fpu_exception : yes
cpuid level : 5
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc pebs bts pni dtes64 monitor ds_cpl cid cx16 xtpr
bogomips : 5985.25
clflush size : 64
cache_alignment : 128
address sizes : 36 bits physical, 48 bits virtual
power management:
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting TimP (Intel)
There's little point in setting -march or -mtune to core2 for x86_64, even in gcc. The defaults (compatibility with nocona and opteron) are fine. mtune is redundant, once march is set to the same value. Does icc work when those options are omitted?
For complex numbers, gsl uses local struct definitions, which are unlikely to be recognized by compilers as optimizable by -msse3.
Perhaps this unusual combination of options is misinterpreted by icc. -march=core2 would imply -mssse3.
It might be safer to add -msse3 to the option list rather than depending on various compilers to set architecture by -march=core2. Recent production core 2 CPUs support also -msse4.1, but early ones (like mine) don't. It should be safe to set -xssse3 if you want code which is compatible with all Core2 CPUs but not with AMD.
For complex numbers, gsl uses local struct definitions, which are unlikely to be recognized by compilers as optimizable by -msse3.
Perhaps this unusual combination of options is misinterpreted by icc. -march=core2 would imply -mssse3.
It might be safer to add -msse3 to the option list rather than depending on various compilers to set architecture by -march=core2. Recent production core 2 CPUs support also -msse4.1, but early ones (like mine) don't. It should be safe to set -xssse3 if you want code which is compatible with all Core2 CPUs but not with AMD.
I removed mtune and mcore and it worked, now I am compiling it with -xsse3, just to test.
By the way, do you think it is possible to have a more aggressive optimization?
---------------------------------------------
EDIT
-xsse3 works
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The most obvious more aggressive compiler option would be -O3, which may give you optimizations on nested loops, but it's not out of the question that you may not find any way to improve performance, even if you have a meaningful measurement. Your CPU appears to be one which doesn't support SSE4.1; besides, I've run into several cases where compiling for that architecture reduced performance, even though the CPU supported it.

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page