Is there any penalties with in Intel SSE4?
Read in some document like accessing the partial register data from XMM register and from GPRs will cause some penalty.
Is there any document to understand better on the Data transfer penalties among the SSE registers.
This seems an overly wide and unspecific topic, so I'm not surprised you didn't get a timely response. Are you referring to the topic discussed in https://software.intel.com/en-us/forums/topic/308004 ?