I am writing a paper about interval arithmetic using SSE2 instructions which is part of my library for exact real number computations, and while doing it I realized SSE3 could have been quite helpful if it were done slightly differently.
My exact question is: I am curious why did Intel prefer to include a addsub instruction instead of multiplication with one of the arguments negated, i.e. something like
giving xmm1.1 * xmm2.1, (-xmm1.0) * xmm2.0
Using this the addsubpd instruction would not be needed to compute complex multiplications and divisions.
What I believe to be more important, however, is the behavior of Intel's sample SSE3 code for complex multiplication when the rounding mode is set to something other than rounding-to-nearest. More specifically, the SSE3 complex multiplication code would not compute upper bounds for the product when the rounding is to +inf, nor lower bounds for -inf, because the rounding of the multiplication that computes the substracted component would be rounded incorrectly.
This would not be the case if a mulpn instruction were available instead of addsub, because the result of the multiplication would be rounded the correct way. A mulpn would also be very useful for single or double precision interval arithmetic using the SIMD registers.
Does anyone know why Intel preferred addsub to this?
- Intel® Advanced Vector Extensions (Intel® AVX)
- Intel® Streaming SIMD Extensions
- Parallel Computing
We are always looking for new instructions and feedback to make our architectures better suited to our customers' needs. If you would like to write up your requestwith a bit more detail and send it to us here, we would be glad to forward the information to ourarchitects to consider the request for future architectures. We would also need to know what you want to use it for.
Message Edited by intel.software.network.support on 11-15-2005 11:18 PM
I know this is an old post but I am curious to hear if the author has updated his code. There is an instruction BLENDVPD in SSE 4.1 which makes conditional selection of double precision values easier.