Haswell RCPPS/RSQRTPS implementation

maratyszcza · ‎05-01-2013

Hi,

I work on code which targets AVX2 + FMA3 and depends on the accuracy of VRCPPS/VRSQRTPS. Should I expect the implementation of these instructions on Haswell to be the same as on Ivy Bridge?

Regards,

Marat

SergeyKostrov · ‎05-02-2013

>>...Should I expect the implementation of these instructions on Haswell to be the same as on Ivy Bridge? Do you mean microcodes for both CPUs? Please clarify.

TimP · ‎05-02-2013

I suppose he means do the instructions maintain the same numerical behavior, as I'd expect.

I didn't hear anything about one time proposed addition or substitution of corresponding instructions with sufficient accuracy to support double with 2 iterations, such as early AMD SSE CPUs had. Current Intel implementations of iterative divide and sqrt methods are specified to maintain 49-bit precision unless the options to accept less are set.

maratyszcza · ‎05-02-2013

Thanks, Tim. However, I'm concerned not only about accuracy, but also about convergence to correctly rounded result. Reciprocal computed with FMA converges to correctly rounded result, but requires that the initial approximation overestimates 1/x when x is a power of two. The RCPPS implementation in Ivy Bridge does not overestimate 1/x for these cases, and if we compute reciprocal with FMA and using Ivy Bridge RCPPS implementation, it will not produce correctly rounded result when x is power of two (e.g. rcp(0x1.FFFFFFFFFFFFFp-1) will converge to 0x1.0000000000000p+0 instead of 0x1.0000000000001p+0).

Thus I wonder if RCPPS/RSQRTPS on Haswell produce numerically different results than on Ivy Bridge.

TimP · ‎05-02-2013

Interesting point. I haven't heard of any changes, but others here are more expert on that.

BRET_T_Intel · ‎05-02-2013

The VRCPPS and VRSQRTPS instructions on the Haswell family microarchitecture are intended to return identical results and have the same behavior as found on the Ivy Bridge microarchitecture.

SergeyKostrov · ‎05-02-2013

I could do a verification on Ivy Bridge system if you provide a test case.

maratyszcza · ‎05-03-2013

Thank you, TimP and BRET T., this is what I was looking for.

Sergey Kostrov, I already have an Ivy Bridge system, and it has the issue I described above.

zalia64 · ‎05-07-2013

Please help an outsider: what is the instruction VRCPPS? it does not appear in the Intel documentation : Intel® 64 and IA-32 Architectures, Software Developer’s Manual, Volume 2 (2A, 2B & 2C), Instruction Set Reference, A-Z, Jan. 2013 .

From the discussion above, it looks like an higher-precision version of RCPPS .

Could you name a reference for that (and probably othe) new instructions? I surely would like an high-precision of the RSQRTPS , too.

Thanks

Bernard · ‎05-08-2013

Hi amos,

glad to see someone from Israel:)

VRCPPS computes the reciprocal of 8 32bit floating values.It is AVX instruction type.

Bernard · ‎05-08-2013

You may consult this compiler documentation ://software.intel.com/sites/products/documentation/studio/composer/en-us/2011Update/compiler_c/

zalia64 · ‎05-08-2013

Thanks, iliyapolak. I should have guessed, with the 'V' at the start.

I am an old hand with assembly language, but a new one with 64-bit. My MASM does not support AVX instructions. So, I will have to look somewhere else for AVX support (perhaps WASM)..

Would you happen to know if 'Visual C 2012 professional' supports AVX in the integrated debugger? That is, when using debug and stepping through the code, will the 'disassembly window' show the AVX instructions ? Otherwise, debugging is a big problem...

I have to convert a program from Matlab into C+ASM, for speed. AVX could be a great help, IF and ONLY IF I could find a full-size assembler and debugger. Would you suggest a development enviroment for native AVX ? native AVX, not C mnemonics.

Bernard · ‎05-08-2013

Hi amos, I am using MASM with ml.exe copied from VS2010 so it according to wikipedia supports AVX,but I did not try it.What MASM version are you using? VS2012 supports AVX by using switch /arch:AVX. You can try to copy VS2012 ML.exe maybe this assembler does support AVX. I suppose when AVX switch is enabled thus in VS debugger you will see YMM register it is quite logical. For native AVX you can try to set VS2012 IDE please consult this link ://www.codeproject.com/Articles/271627/Assembly-Programming-with-Visual-Studio-2010-2012

Bernard · ‎05-08-2013

Sorry I answered two times and both posts were queued for approval.It is unbelivable. For AVX instructions support in MASM you can check ML.exe assembler from VS2012.I am not sure if MS updated its assembler to support newest instruction set,but you can check it. VS2012 supports AVX instruction set with /arch:AVX switch and it is quite logical that integrated debugger will display YMMx registers content. You can use VS2012 as IDE for programming in assembly please check this link ://www.codeproject.com/Articles/271627/Assembly-Programming-with-Visual-Studio-2010-2012

Bernard · ‎05-08-2013

Hi amos,

you mentioned in your post that you are converting matlab program into presumably c/inline assembly version so I would like to ask you do you write scientific software?I have a few projects mainly large library of special functions which i try to optimize are you interested in it?

Thank you in advance

zalia64 · ‎05-08-2013

Hi iliya,

Yes, I develope scientific software. Usually to perform algorithms in Machine Vision, Image Enhancement, On-Line quality control and other problems of noisy and clattered inputs. Real time working systems, not mathematical models.

About optimising your library - perhaps I could be of help. Pls contact in private message.

Bernard · ‎05-08-2013

Thank you amos I have already contacted you.If you are interested I had a problem with the Horner scheme in cosine function implemented in inline asm block here is link ://software.intel.com/en-us/forums/topic/347470