Intel 64 architecture provides 8 additional registers for streaming SIMD extensions.
Can i use it to write my assembly code which will run on intel 64 architecture but the os is win32 ?
Must the program optimized for intel 64 architecture run on 64-bit OS?
How much is the software optimized for intel 64 architecture speedup compare with the same software optimized for intel 32 architecture ?
In order to use the additional registered (both general purpose and SSE), the process must be in 64-bit mode, therefore, you must be running on a 64-bit OS and the application must be compiled to generate 64-bit code.
I don't quite understand your last question. In general, if you use the Intel C++ Compiler, you will see performance improvements, if for not other reason, because there are additional registers allowing the compiler to keep for data in registers and go out to memory less often.