I see that in (b) alignment haven't been done as "movsd 40(%rsp), %xmm0" has been called. Moreover in (c) none of the SSE2 alignment instructions like movaps/movapd/movdqa or movups/movupd/movdqu are being called. Probably since only three parameters(X, Y, Z)exist here, could be the reason.
Suggestionsneeded: (i) Can call of "movsd 40(%rsp), %xmm0" is correct from optimization point of view or it should be replaced with alignment SSE instructions call?
(ii) Could above patterns for (a), (b)& (c)be more optimized (speed-up) with some other SSE instructions OR replaced by SSE3 or SSSE3 instructions. If YES, can a pattern of SSE3/SSSE3 which instructions be used to replace above SSE2 instructions?
(iii) Since here the algorithm has 3 parameters and asm beingrepresented only for these 3 parameters. Do I need to generate a dummy asm representation of instructions for 4th. parameter (say W) which has void contents to maintain the DP FP alignment and effective vectorization?
In continuation, didhad togenerate asm for algorithmof X, Y, Z parameters since the original C/C++ codehas been writtenin such a way that it fails to add address MCA (multi-core achitecture) design needs which means if I have 4th. parameter as a local scopewithin the file than optimization can be done by taking care of alignment and DP FP 2 or 4 vectorization.
So looking for some suggestions for above (i), (ii) and (iii) queries.