I work on code which targets AVX2 + FMA3 and depends on the accuracy of VRCPPS/VRSQRTPS. Should I expect the implementation of these instructions on Haswell to be the same as on Ivy Bridge?
I suppose he means do the instructions maintain the same numerical behavior, as I'd expect.
I didn't hear anything about one time proposed addition or substitution of corresponding instructions with sufficient accuracy to support double with 2 iterations, such as early AMD SSE CPUs had. Current Intel implementations of iterative divide and sqrt methods are specified to maintain 49-bit precision unless the options to accept less are set.
Thanks, Tim. However, I'm concerned not only about accuracy, but also about convergence to correctly rounded result. Reciprocal computed with FMA converges to correctly rounded result, but requires that the initial approximation overestimates 1/x when x is a power of two. The RCPPS implementation in Ivy Bridge does not overestimate 1/x for these cases, and if we compute reciprocal with FMA and using Ivy Bridge RCPPS implementation, it will not produce correctly rounded result when x is power of two (e.g. rcp(0x1.FFFFFFFFFFFFFp-1) will converge to 0x1.0000000000000p+0 instead of 0x1.0000000000001p+0).
Thus I wonder if RCPPS/RSQRTPS on Haswell produce numerically different results than on Ivy Bridge.
The VRCPPS and VRSQRTPS instructions on the Haswell family microarchitecture are intended to return identical results and have the same behavior as found on the Ivy Bridge microarchitecture.
Please help an outsider: what is the instruction VRCPPS? it does not appear in the Intel documentation : Intel® 64 and IA-32 Architectures, Software Developer’s Manual, Volume 2 (2A, 2B & 2C), Instruction Set Reference, A-Z, Jan. 2013 .
From the discussion above, it looks like an higher-precision version of RCPPS .
Could you name a reference for that (and probably othe) new instructions? I surely would like an high-precision of the RSQRTPS , too.
Thanks, iliyapolak. I should have guessed, with the 'V' at the start.
I am an old hand with assembly language, but a new one with 64-bit. My MASM does not support AVX instructions. So, I will have to look somewhere else for AVX support (perhaps WASM)..
Would you happen to know if 'Visual C 2012 professional' supports AVX in the integrated debugger? That is, when using debug and stepping through the code, will the 'disassembly window' show the AVX instructions ? Otherwise, debugging is a big problem...
I have to convert a program from Matlab into C+ASM, for speed. AVX could be a great help, IF and ONLY IF I could find a full-size assembler and debugger. Would you suggest a development enviroment for native AVX ? native AVX, not C mnemonics.
you mentioned in your post that you are converting matlab program into presumably c/inline assembly version so I would like to ask you do you write scientific software?I have a few projects mainly large library of special functions which i try to optimize are you interested in it?
Thank you in advance
Yes, I develope scientific software. Usually to perform algorithms in Machine Vision, Image Enhancement, On-Line quality control and other problems of noisy and clattered inputs. Real time working systems, not mathematical models.
About optimising your library - perhaps I could be of help. Pls contact in private message.