- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Hi,

I work on code which targets AVX2 + FMA3 and depends on the accuracy of VRCPPS/VRSQRTPS. Should I expect the implementation of these instructions on Haswell to be the same as on Ivy Bridge?

Regards,

Marat

Link Copied

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

I suppose he means do the instructions maintain the same numerical behavior, as I'd expect.

I didn't hear anything about one time proposed addition or substitution of corresponding instructions with sufficient accuracy to support double with 2 iterations, such as early AMD SSE CPUs had. Current Intel implementations of iterative divide and sqrt methods are specified to maintain 49-bit precision unless the options to accept less are set.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Thanks, Tim. However, I'm concerned not only about accuracy, but also about convergence to correctly rounded result. Reciprocal computed with FMA converges to correctly rounded result, but requires that the initial approximation overestimates 1/x when x is a power of two. The RCPPS implementation in Ivy Bridge does not overestimate 1/x for these cases, and if we compute reciprocal with FMA and using Ivy Bridge RCPPS implementation, it will not produce correctly rounded result when x is power of two (e.g. rcp(0x1.FFFFFFFFFFFFFp-1) will converge to 0x1.0000000000000p+0 instead of 0x1.0000000000001p+0).

Thus I wonder if RCPPS/RSQRTPS on Haswell produce numerically different results than on Ivy Bridge.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Interesting point. I haven't heard of any changes, but others here are more expert on that.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

The VRCPPS and VRSQRTPS instructions on the Haswell family microarchitecture are intended to return identical results and have the same behavior as found on the Ivy Bridge microarchitecture.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Thank you, TimP and BRET T., this is what I was looking for.

Sergey Kostrov, I already have an Ivy Bridge system, and it has the issue I described above.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Please help an outsider: what is the instruction VRCPPS? it does not appear in the Intel documentation : * Intel® 64 and IA-32 Architectures, Software Developer’s Manual, Volume 2 (2A, 2B & 2C), Instruction Set Reference, A-Z, Jan. 2013 .*

From the discussion above, it looks like an higher-precision version of RCPPS .

Could you name a reference for that (and probably othe) new instructions? I surely would like an high-precision of the RSQRTPS , too.

Thanks

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Hi amos,

glad to see someone from Israel:)

VRCPPS computes the reciprocal of 8 32bit floating values.It is AVX instruction type.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Thanks, iliyapolak. I should have guessed, with the 'V' at the start.

I am an old hand with assembly language, but a new one with 64-bit. My MASM does not support AVX instructions. So, I will have to look somewhere else for AVX support (perhaps WASM)..

Would you happen to know if 'Visual C 2012 professional' supports AVX in the integrated debugger? That is, when using debug and stepping through the code, will the 'disassembly window' show the AVX instructions ? Otherwise, debugging is a big problem...

I have to convert a program from Matlab into C+ASM, for speed. AVX could be a great help, IF and ONLY IF I could find a full-size assembler and debugger. Would you suggest a development enviroment for native AVX ? native AVX, not C mnemonics.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Hi amos,

you mentioned in your post that you are converting matlab program into presumably c/inline assembly version so I would like to ask you do you write scientific software?I have a few projects mainly large library of special functions which i try to optimize are you interested in it?

Thank you in advance

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Hi iliya,

Yes, I develope scientific software. Usually to perform algorithms in Machine Vision, Image Enhancement, On-Line quality control and other problems of noisy and clattered inputs. Real time working systems, not mathematical models.

About optimising your library - perhaps I could be of help. Pls contact in private message.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page