Need to vzeroupper if 128-bit operations are used?

andysem · ‎02-15-2013

Hi,

Its is known that a runtime penalty is ensued when I switch from AVX instructions to SSE unless I use vzeroupper/vzeroall to clean the upper halves of the ymm registers before the switch. Am I correct assuming that the cleanup is not needed if I only use lower halves of ymm registers in my AVX code (i.e. VEX-encoded SSE code)?

SergeyKostrov · ‎02-15-2013

>>...cleanup is not needed if I only use lower halves of ymm registers in my AVX code... Since your code is already AVXed than penalties should not occur. However, in a thread: Forum topic: AVX transition penalties and OS support Web-link: software.intel.com/en-us/forums/topic/364851 there is a link to a Pdf document Avoiding AVX to SSE Transition Penalties and it describes how it could be verified with VTune, or with Intel Software Development Emulator, and please take a look. If it is critical for your processing than a verification in a Disassembler is needed in order to confirm that there are no any SSE instructions.

andysem · ‎02-15-2013

The application has both AVX and legacy SSE code, some of it in third party libraries. Thanks for the link to the paper. From it it looks like my assumption is correct.

Christian_M_2 · ‎02-16-2013

I, for example always use the Intel Software Development Emulator as it is free.

You can get a report, if any transition penalties occure. This is very helpful. VTune also gives you the related code position that is responsible for this.

But you can get good results with Intel Software Development Emulator and the visual studio 2012 integration. Then you can debug an application with Intel Software Development Emulator. See here: http://software.intel.com/en-us/articles/intel-software-development-emulator#DEBUG-WIN

Specify "-oast <filename.txt>" as parameter fro Intel SDE. After debugging you get a file containing transition penalty information. I realized that if you use Intel SDE this way, you also get function name that is responsible for penalties.

SergeyKostrov · ‎02-16-2013

>>... Intel Software Development Emulator as it is free... Could you verify in Release Notes if SDE could be installed on a computer with Windows XP OS ( Professional and Home editions )? Thanks in advance.

perfwise · ‎02-18-2013

I've verified on my SB and IB.. that when transitioning from SSE to AVX.. you don't have a penalty upon transitioning from one to the other so long as you refrain from using 256-bit instructions. If you use a 256-bit instruction the penalty is ~150 cycles, if you don't "vzeroupper" beforehand.

Perfwise

SergeyKostrov · ‎02-18-2013

>>...If you use a 256-bit instruction the penalty is ~150 cycles, if you don't "vzeroupper"... Thanks for that number and it looks like a real performance "killer". I wonder why these transitions are taking so many cycles? Isn't that some design issue(s) with CPUs that support AVX?

Christian_M_2 · ‎02-18-2013

Sergey Kostrov wrote:

>>...If you use a 256-bit instruction the penalty is ~150 cycles, if you don't "vzeroupper"...

Thanks for that number and it looks like a real performance "killer". I wonder why these transitions are taking so many cycles? Isn't that some design issue(s) with CPUs that support AVX?

Really, is it 150 cycles? I thought it would be 75 cycles. The only thing I noticed is that you can get a very bad combination of AVX and SSE where there is a transition penalty immediately before and after an certain instruction. For example if you go AVX and have on SSE instruction and repeat this in a loop. Then I get the 150 cycles as combination of both transition penalties.

Nonetheless, the penalty is quite heavy for storing restoring all the YMM registers and some CPU states connected to this issue.

// EDIT:I can not test Intel SDE on Windows XP, I have switched to Windows 7 some time ago. The only thing I might test is Intel SDE on XP in VirtualBox, which I use still for some applications. Don't know if that has an big impact, running a CPU emulation tool under virtual machine. Please tell me, if you want me to do this.

SergeyKostrov · ‎02-18-2013

>>...:I can not test Intel SDE on Windows XP, I have switched to Windows 7 some time ago. The only thing I might test is >>Intel SDE on XP in VirtualBox, which I use still for some applications. Don't know if that has an big impact, running >>a CPU emulation tool under virtual machine. Please tell me, if you want me to do this. Yes, please, if it doesn't take too much time ( let's say no more than 10 - 15 mins or so ). Note: I've looked at Release Notes of SDE on its web-page and I haven't found anything regarding SDE compatibility with different Windows OSs.

Christian_M_2 · ‎02-21-2013

I did a basic check: VirtualBox with Windows XP Proessional SP3, 32 bit.

Then I depacked Intel SDE to a directory and run an exe with AVX code from command line. Results of normal double code and avx code meet for different calculations. So everything seems to be finde. Code was created with VS2010 as for Intel Compiler or VS2012 I would have need other runtime packages to install. This would have taken some more time.

But in my mind, good thing that Intel SDE runs on XP in virtualized environment.

SergeyKostrov · ‎02-21-2013

>>...I did a basic check: VirtualBox with Windows XP Proessional SP3, 32 bit. >>... >>But in my mind, good thing that Intel SDE runs on XP in virtualized environment. Thank you for the test! Mark ( Intel ), I wonder if Release Notes of SDE could be updated with a list of all supported OSs? Thanks in advance.