Intel® ISA Extensions
Use hardware-based isolation and memory encryption to provide more code protection in your solutions.
Announcements
Welcome to the Intel Community. If you get an answer you like, please mark it as an Accepted Solution to help others. Thank you!

Mixing AVX and MMX code

Elmar
Beginner
265 Views

Dear all,

hope this hasn't been asked before, but I couldn't find a way to search the forum..?

In high performance code I'm using MMX and SSE together, since this gives me 8 additional very valuable registers. Looking at the AVX docs, this seems no longer possible with AVX code, since all MMX-related SSE instructions have not been promoted with a VEX prefix, and are therefore legacy instructions which I may no longer use (or face the deadly mixing penalty that requires VZEROUPPER etc.).

Is is correct that it's no longer possible to make heavy use of MMX registers in AVX code?

Can I at least continue using MMX registers without performance impact as long as there is no data transfer between MMX and SSE registers?

Thanks,

Elmar

0 Kudos
8 Replies
SergeyKostrov
Valued Contributor II
265 Views
It is Not clear if you're using assembler language ( inline in C/C++ codes ) or Intel intrinsic functions. If you're using Intel C++ compiler take a look at sse2mmx.h header file in a ..\Compiler\Include folder.
bronxzv
New Contributor II
265 Views

Elmar wrote:

Dear all,

hope this hasn't been asked before, but I couldn't find a way to search the forum..?

In high performance code I'm using MMX and SSE together, since this gives me 8 additional very valuable registers. Looking at the AVX docs, this seems no longer possible with AVX code, since all MMX-related SSE instructions have not been promoted with a VEX prefix, and are therefore legacy instructions which I may no longer use (or face the deadly mixing penalty that requires VZEROUPPER etc.).

Is is correct that it's no longer possible to make heavy use of MMX registers in AVX code?

Can I at least continue using MMX registers without performance impact as long as there is no data transfer between MMX and SSE registers?

Thanks,

Elmar

not a direct answer to your question, sorry, but I'll strongly suggest to port your MMX code to SSE2, even if you have less logical registers than with your 8 MMX + 8 XMM  combination (or 8 MMX + 16 XMM  in 64-bit mode) you should measure good speedups thanks to the doubled throughput, it easily offsets the fact that you have less logical registers, I have experimented just that, a long time ago

then, you'll be able to compile your code (the very same source code if you use intrinsics) for AVX, if you want the same source code for AVX2 targets (with a doubled throughput again) it will be more challenging with intrinsics though

 

 

Elmar
Beginner
265 Views

Sergey Kostrov wrote:

It is Not clear if you're using assembler language ( inline in C/C++ codes ) or Intel intrinsic functions. If you're using Intel C++ compiler take a look at sse2mmx.h header file in a ..\Compiler\Include folder.

Many thanks for your quick reply. I'm actually using my own code generator that creates assembly code for NASM. I'm now adding code paths for AVX and AVX2, and I hoped that an Intel insider could tell me what is and is not allowed regarding MMX.

For example, if I read the AVX docs correctly, the instruction MOVQ2DQ XMM0,MM0 is no longer allowed in AVX code because there is no VEX prefix version (and thus a huge penalty for mixing AVX and legacy code).

If this is correct, is it at least allowed to mix MMX-only code with AVX code? (I.e. code that runs entirely in MMX registers and never transfers data to an SSE register, or goes through memory for the transfer) Or are there other hidden pitfalls?

(Please don't suggest to simply stop using MMX, I'm gaining a lot of performance by using MMX for short vectors up to 8 bytes long, which would otherwise have to be spilled to memory, since my SSE/AVX registers are always full to the limit, especially in 32bit mode).

Thanks,

Elmar

SergeyKostrov
Valued Contributor II
265 Views
>>...Please don't suggest to simply stop using MMX, I'm gaining a lot of performance by using MMX for short vectors I support that firm position. >>up to 8 bytes long, which would otherwise have to be spilled to memory, since my SSE/AVX registers are always full to >>the limit, especially in 32bit mode)... Please take a look at a very good article Avoiding AVX-SSE Transition Penalties ( attached ). Even if it is Not related to AVX-to-MMX transitions it has lots of technical details and recommendations.
Bernard
Black Belt
265 Views

>>>Is is correct that it's no longer possible to make heavy use of MMX registers in AVX code?>>>

I wonder if is it even possible?

bronxzv
New Contributor II
265 Views

iliyapolak wrote:

>>>Is is correct that it's no longer possible to make heavy use of MMX registers in AVX code?>>>

I wonder if is it even possible?

there is no reason it isn't possible since only the physical registers are shared, the 8 MMX architected registers and the 16 (8 in 32-bit mode) YMM architected registers are fully separated (when modifying one kind of register there is no side effect to a register in the other set), the transition occurs for SSE to/from AVX though, since the XMM architected state is aliasing the YMM state (much like the MMX state is aliasing the x87 stack)

based on this, I don't see why there can be a "transition penalty" since there is actually no transition, just like when you mix x87 code and AVX code, I know it works with no problem (no issues reported by VTune Amplifier for example) from hand-on experience, I suppose it's exactly the same with MMX code since the MMX state is aliasing the x87 state but has no intersection with the XMM/YMM state, but in this case I miss first hand experience since I don't have MMX code in my code base anymore

jimdempseyatthecove
Black Belt
265 Views

Insted of worring about loss of 8 MMX registers, think about how to best use the other half of the 16 ymm registers.

Jim Dempsey

Bernard
Black Belt
265 Views

>>>Insted of worring about loss of 8 MMX registers, think about how to best use the other half of the 16 ymm registers.>>>

Very true.

Reply