Intel® ISA Extensions
Use hardware-based isolation and memory encryption to provide more code protection in your solutions.

Haswell RCPPS/RSQRTPS implementation

maratyszcza
Beginner
1,277 Views

Hi,

I work on code which targets AVX2 + FMA3 and depends on the accuracy of VRCPPS/VRSQRTPS. Should I expect the implementation of these instructions on Haswell to be the same as on Ivy Bridge?

Regards,

Marat

0 Kudos
16 Replies
SergeyKostrov
Valued Contributor II
1,277 Views
>>...Should I expect the implementation of these instructions on Haswell to be the same as on Ivy Bridge? Do you mean microcodes for both CPUs? Please clarify.
0 Kudos
TimP
Honored Contributor III
1,277 Views

I suppose he means do the instructions maintain the same numerical behavior, as I'd expect.

I didn't hear anything about one time proposed addition or substitution of corresponding instructions with sufficient accuracy to support double with 2 iterations, such as early AMD SSE CPUs had.  Current Intel implementations of iterative divide and sqrt methods are specified to maintain 49-bit precision unless the options to accept less are set.

0 Kudos
maratyszcza
Beginner
1,277 Views

Thanks, Tim. However, I'm concerned not only about accuracy, but also about convergence to correctly rounded result. Reciprocal computed with FMA converges to correctly rounded result, but requires that the initial approximation overestimates 1/x when x is a power of two. The RCPPS implementation in Ivy Bridge does not overestimate 1/x for these cases, and if we compute reciprocal with FMA and using Ivy Bridge RCPPS implementation, it will not produce correctly rounded result when x is power of two (e.g. rcp(0x1.FFFFFFFFFFFFFp-1) will converge to 0x1.0000000000000p+0 instead of 0x1.0000000000001p+0).

Thus I wonder if RCPPS/RSQRTPS on Haswell produce numerically different results than on Ivy Bridge.

0 Kudos
TimP
Honored Contributor III
1,277 Views

Interesting point.  I haven't heard of any changes, but others here are more expert on that.

0 Kudos
BRET_T_Intel
Employee
1,277 Views

The VRCPPS and VRSQRTPS instructions on the Haswell family microarchitecture are intended to return identical results and have the same behavior as found on the Ivy Bridge microarchitecture.

0 Kudos
SergeyKostrov
Valued Contributor II
1,277 Views
I could do a verification on Ivy Bridge system if you provide a test case.
0 Kudos
maratyszcza
Beginner
1,277 Views

Thank you, TimP and BRET T., this is what I was looking for.

Sergey Kostrov, I already have an Ivy Bridge system, and it has the issue I described above.

0 Kudos
zalia64
New Contributor I
1,278 Views

Please help an outsider: what is the instruction VRCPPS? it does not appear  in the Intel documentation :    Intel® 64 and IA-32 Architectures, Software Developer’s Manual, Volume 2 (2A, 2B & 2C), Instruction Set Reference, A-Z, Jan. 2013 .

From the discussion above, it looks like an higher-precision version of RCPPS .

Could you name a reference for that (and probably othe) new instructions? I surely would like an high-precision of the RSQRTPS ,  too.

Thanks

0 Kudos
Bernard
Valued Contributor I
1,278 Views

Hi amos,

glad to see someone from Israel:)

VRCPPS computes the reciprocal of 8 32bit floating values.It is AVX instruction type.

0 Kudos
Bernard
Valued Contributor I
1,278 Views
You may consult this compiler documentation ://software.intel.com/sites/products/documentation/studio/composer/en-us/2011Update/compiler_c/
0 Kudos
zalia64
New Contributor I
1,278 Views

Thanks, iliyapolak. I should have guessed, with the 'V' at the start.

I am an old hand with assembly language, but a new one with 64-bit. My MASM does not support AVX instructions. So, I will have to look somewhere else for AVX support (perhaps WASM)..

Would you happen to know if 'Visual C 2012 professional'  supports AVX in the integrated debugger?  That is, when using debug and stepping through the code,  will the 'disassembly window' show the AVX instructions ?  Otherwise, debugging is a big problem... 

I have to convert a program from Matlab into C+ASM, for speed. AVX could be a great help,  IF and ONLY IF I could find a full-size assembler and debugger.  Would you suggest a development enviroment for native AVX ?   native AVX, not C mnemonics.

0 Kudos
Bernard
Valued Contributor I
1,277 Views
Hi amos, I am using MASM with ml.exe copied from VS2010 so it according to wikipedia supports AVX,but I did not try it.What MASM version are you using? VS2012 supports AVX by using switch /arch:AVX. You can try to copy VS2012 ML.exe maybe this assembler does support AVX. I suppose when AVX switch is enabled thus in VS debugger you will see YMM register it is quite logical. For native AVX you can try to set VS2012 IDE please consult this link ://www.codeproject.com/Articles/271627/Assembly-Programming-with-Visual-Studio-2010-2012
0 Kudos
Bernard
Valued Contributor I
1,277 Views
Sorry I answered two times and both posts were queued for approval.It is unbelivable. For AVX instructions support in MASM you can check ML.exe assembler from VS2012.I am not sure if MS updated its assembler to support newest instruction set,but you can check it. VS2012 supports AVX instruction set with /arch:AVX switch and it is quite logical that integrated debugger will display YMMx registers content. You can use VS2012 as IDE for programming in assembly please check this link ://www.codeproject.com/Articles/271627/Assembly-Programming-with-Visual-Studio-2010-2012
0 Kudos
Bernard
Valued Contributor I
1,277 Views

Hi amos,

you mentioned in your post that you are converting matlab program into presumably c/inline assembly version so I would like to ask you do you write scientific software?I have a few projects mainly large library of special functions which i try to optimize are you interested in it?

Thank you in advance

0 Kudos
zalia64
New Contributor I
1,277 Views

Hi iliya,

Yes, I develope scientific software. Usually to perform algorithms in Machine Vision, Image Enhancement, On-Line quality control and other problems of noisy and clattered inputs. Real time working systems, not mathematical models.

About optimising your library - perhaps I could be of help. Pls contact in private message.

0 Kudos
Bernard
Valued Contributor I
1,277 Views
Thank you amos I have already contacted you.If you are interested I had a problem with the Horner scheme in cosine function implemented in inline asm block here is link ://software.intel.com/en-us/forums/topic/347470
0 Kudos
Reply