Software Tuning, Performance Optimization & Platform Monitoring
Discussion regarding monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform updating.
The Intel sign-in experience has changed to support enhanced security controls. If you sign in, click here for more information.

Intel ATOM SSE3_ATOM optimization under Windows 32-bit -O3

Basically: I would Love to try to compile several applications for Atom with SSE3_ATOM -O3 , the Atom is running Windows XP 32-bit, to test the boost in speed on Intel ATOM (normally it is about 31 percent). :-)

I have never done this so I would appreciate some advice - how to make it stable with the -O3 switch.

I am going to use MS VS 2008 to run the Intel compiler from it...

And I will probably compile it on Win XP Prof 32-bit , running on a P4 HT "E2200" CPU.
Will this compile it OK? Or shall I preferably compile it directly on the Atom CPU?

Thanx for any advice.
Filip The Overtonesinger
0 Kudos
4 Replies
Black Belt

If you set a specific option such as SSE3 or SSE3_ATOM, you get the same code generation regardless of whether you build on an Atom or on a P4 compatible CPU. You won't be able to test the SSE3_ATOM code on the P4, and some of our expert colleagues have recommended that plain SSE3 should be specified even when targeting Atom. The recent ICL releases have been designedto schedule as well as possible for Atom even when generating code which can run on any SSE3 compatible "P4."
Not knowing specifically what you mean by your comment about stability, I would suggest that you set /fp:source at least to establish a reference, until you determine that you need more aggressive options for performance.
If you have questions about reliability of -O3 with specific cases of source code and versions of the compiler, you would more likely get answers on the C++ forum.
I take it you are seeing a boost in performance for ICL over VC9; that's good news. The usual case for such a gain is with auto-vectorization. At -O3, ICL will attempt loop transformations to improve vectorization; reliability may be improved if you can write such loops so as to get full performance without the special -O3 transformations.

Valued Contributor II
Hi eberybody,

Is it possible to use theIntel Software Development Emulator on a computer with P4 CPU in order to
emulateSSE3 instructions?

Best regards,
Black Belt
The Intel64 versions of P4, such as the one referred to in this thread, have SSE3 built in.
Valued Contributor II

One of my test computers has aP4 CPU and it doesn't support SSE3.My question is:

DoesIntel Software Development Emulator support SSE3 instruction set?

Here are some technical data for theP4 CPU fromthe test computer:

Part 1:
CPU Vendor: GenuineIntel
HTT and Streaming SIMD Extensions features:
HTT : 1
MMX : 1
SSE2 : 1
SSE3 : 0
SSSE3 : 0
SSE4.1: 0
SSE4.2: 0

Part 2:
Basic CPUID Information 0
CPUInfo[0] = 0x00000002
CPUInfo[1] = 0x756E6547
CPUInfo[2] = 0x6C65746E
CPUInfo[3] = 0x49656E69
Basic CPUID Information 1
CPUInfo[0] = 0x00000F12
CPUInfo[1] = 0x00010808
CPUInfo[2] = 0x00000000
CPUInfo[3] = 0x3FEBFBFF
Basic CPUID Information 2
CPUInfo[0] = 0x665B5001
CPUInfo[1] = 0x00000000
CPUInfo[2] = 0x00000000
CPUInfo[3] = 0x007A7040
Extended Function CPUID Information 80000000
CPUInfo[0] = 0x80000004
CPUInfo[1] = 0x00000000
CPUInfo[2] = 0x00000000
CPUInfo[3] = 0x00000000
Extended Function CPUID Information 80000001
CPUInfo[0] = 0x00000000
CPUInfo[1] = 0x00000000
CPUInfo[2] = 0x00000000
CPUInfo[3] = 0x00000000
Extended Function CPUID Information 80000002
CPUInfo[0] = 0x20202020
CPUInfo[1] = 0x20202020
CPUInfo[2] = 0x20202020
CPUInfo[3] = 0x6E492020
Extended Function CPUID Information 80000003
CPUInfo[0] = 0x286C6574
CPUInfo[1] = 0x50202952
CPUInfo[2] = 0x69746E65
CPUInfo[3] = 0x52286D75
Extended Function CPUID Information 80000004
CPUInfo[0] = 0x20342029
CPUInfo[1] = 0x20555043
CPUInfo[2] = 0x30362E31
CPUInfo[3] = 0x007A4847
CPU Brand String: Intel Pentium 4 CPU 1.60GHz
CPU Vendor : GenuineIntel
Stepping ID = 2
Model = 1
Family = 15
Brand Index = 8
CLFLUSH Cache Line Size = 64
The following features are supported:
FPU - Floating Point Unit On Chip
VME - Virtual 8086 Mode Enhancement
DE - Debugging Extensions
PSE - Page Size Extensions
TSC - Time Stamp Counter
MSR - Model Specific Registers RDMSR and WRMSR Instructions
PAE - Physical Address Extensions
MCE - Machine Check Exception
CX8 - CMPXCHG8B Instruction
MTRR - Memory Type Range Registers
PGE - PTE Global Bit
MCA - Machine Check Architecture
CMOV - Conditional Move Instructions
PAT - Page Attribute Table
PSE36- 36-bit Page Size Extension
CLFSH- CLFLUSH Instruction
DS - Debug Store
ACPI - Thermal Monitor and Software Controlled Clock Facilities
MMX - Intel MMX Technology
FXSR - FXSAVE and FXRSTOR Instructions
SSE - SSE Extensions
SSE2 - SSE2 Extensions
SS - Self Snoop
HTT - Hyper-Threading Technology
TM - Thermal Monitor
Enchanced Intel SpeedStep Technology - Unsupported