Intel® ISA Extensions
Use hardware-based isolation and memory encryption to provide more code protection in your solutions.

Wierd instruction: extractps

zhangxiuxia
Beginner
867 Views
In my assembly program, I use sse instruction in computing .

I need to extract 4 single float value in xmm seperately, to do later computing.

I write like that
extractps $3, %xmm0,%xmm0

compiler as reported error:Error: suffix or operands invalid for `extractps'

I look up volumen 2 manual ,find I am wrong .

"
EXTRACTPS reg/m32, xmm2, imm8
"
the destnation should be a register of 64bit or 32bit.

But 64bit or 32bit registers are all for integers , like rax,eax.
How can I do float computing later if I put the result in integer registers ?


0 Kudos
3 Replies
Maxym_D_Intel
Employee
866 Views
rax, eax are called general-purpose registers therefore can hold different datatypes.

have a look at 3.4 BASIC PROGRAM EXECUTION REGISTERS,
from http://download.intel.com/products/processor/manual/325462.pdf
0 Kudos
zhangxiuxia
Beginner
867 Views
Yes, you are right. general-purpose register can hold different datatypes.

But if I want to do some computing ,it is difficult.

I have to store the data in eax back to memory , and then load it .

Because , addl on can do integer add.
addss ,addps, addpd addsd can do float add.

fadd can do float add. but data is stored in fp register stack.
I have to mov data in eax to fp register stack .
This is no memory access.

So I can only use fadd in order to reduce memory access times.

Is what I said right ?
0 Kudos
bronxzv
New Contributor II
867 Views
First of all I'll suggest you to use the intrinsics instead of ASM code.

It looks like what you want to do is to use scalar instructions from packed values, you can do that directly for the element 0, for example use _mm_add_ss onsome data output by _mm_add_ps. Toaccess otherelements simply use a shuffle instruction before the scalar instruction, btw shuffle is typically faster than extractps, for example to rotate right an XMM register with 4 x FP32 :

_mm_shuffle_ps(m,m,_MM_SHUFFLE(0,3,2,1))

before: 3.0 | 4.5 | -2.4 | 1.1
after: 1.1 | 3.0 | 4.5 | -2.4

doing 3 such rotate right in sequence will allow you to access easily all individual elements
0 Kudos
Reply