Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Intel Community
- Software Development Technologies
- Intel® ISA Extensions
- Quad precision floating point arithmetic with SSE/AVX?

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

Highlighted
##

Does latest x86 architecture offer native support for quad precision (QP)floating-point (FP)arithmetic?

If no,canQP be emulatedon XMM and YMM registers with small overhead (< 2X slowdown)compared to the double precision FP arithmetic?

Thanks,

Nick

Mikalai_K_Intel

Employee

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

05-26-2011
07:07 PM

200 Views

Quad precision floating point arithmetic with SSE/AVX?

If no,canQP be emulatedon XMM and YMM registers with small overhead (< 2X slowdown)compared to the double precision FP arithmetic?

Thanks,

Nick

39 Replies

Highlighted
##

No, the purpose of ymm register support is to increase simd parallelism for IEEE 32- and 64-bit data types; thus, the performance of those data types receives a further boost. The quad precision software floating point should speed up slightly, but it's still implemented in scalar integer arithmetic.

TimP

Black Belt

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

05-26-2011
10:01 PM

191 Views

Highlighted
##

Tim,

Did you mean a "boost over the double precision floating point" instead if a "boost over thequadprecision floating point"?

Nick

Mikalai_K_Intel

Employee

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

05-27-2011
04:14 PM

191 Views

Did you mean a "boost over the double precision floating point" instead if a "boost over thequadprecision floating point"?

Nick

Highlighted
##

See if I made myself any clearer when I edited the post. I meant to say that the native types get more performance enhancement than the software quad precision floating point.

Thanks,

TimP

Black Belt

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

05-28-2011
09:03 AM

191 Views

Thanks,

Highlighted
##

Tim,

are there any plans to incorporate native hardware-based support for quad precision, at least into Xeon processors? The performance ofthe pure software implementation is generally too slow for our purposes - mostly large regular financial summation tasks. If not, is there a forum of sorts where one can register interest in hardware support for quad precision in Xeon processors?

Thanks,

Anders

akirkeby

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

06-18-2012
01:43 AM

191 Views

are there any plans to incorporate native hardware-based support for quad precision, at least into Xeon processors? The performance ofthe pure software implementation is generally too slow for our purposes - mostly large regular financial summation tasks. If not, is there a forum of sorts where one can register interest in hardware support for quad precision in Xeon processors?

Thanks,

Anders

Highlighted
##

I'm not aware of any serious move toward hardware quad precision support, or even of any marketing analysis of the extent of financial use of quad precision. I imagine this would require working through an Intel customer support account.

As this would be a long term project (years), I hope you are working with the current implementations of parallelism.

TimP

Black Belt

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

06-18-2012
06:56 AM

191 Views

As this would be a long term project (years), I hope you are working with the current implementations of parallelism.

Highlighted
##

SHIH_K_Intel

Employee

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

06-19-2012
05:10 PM

191 Views

If by quad-precision you have 128-bit Binary Integer Decimal in mind or as candidate for consideration (BID encoding can deal with rounding and precision propagation issues better than binary FP encoding, Intel's DFP library is a great place to start.

You might want to contact the leader of Intel DFP library (he may be able to brief you of future release plan for that library.

http://software.intel.com/en-us/articles/intel-decimal-floating-point-math-library/?wapkw=decimal+fl...

You can also contact me offline to explore potential performance headroom on second andthird generation intel core processorsor Intel Xeon E3 and E5 processors.

Shihjong

Highlighted
##

*...The performance ofthe pure software implementation is generally too slow for our purposes - mostly large*

**regular financial summation tasks***...*

Could you explain why do you need a 128-bit precision in that case?

Rounding problemscreatereal troubles in case of exchange operations and it would beinteresting to understand

what your problem is.

Best regards,

Sergey

SKost

Valued Contributor II

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

06-19-2012
07:06 PM

191 Views

Quoting akirkeby

Could you explain why do you need a 128-bit precision in that case?

Rounding problemscreatereal troubles in case of exchange operations and it would beinteresting to understand

what your problem is.

Best regards,

Sergey

Highlighted
##

For example the value of Pi which is transcendental number with infinite precision and it could benefit from the wider fp registers so range-reduction algorithms could provide more accurate mapping of the large arguments to the suitable range of sine calcualtion.

Bernard

Black Belt

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

06-27-2012
10:18 PM

191 Views

Sometimes it could be useful.When youdeal with the speed of execution vs precision and you do not want the arbitrary precision implementation which is slower than hardware registers.Could you explain why do you need a 128-bit precision in that case

For example the value of Pi which is transcendental number with infinite precision and it could benefit from the wider fp registers so range-reduction algorithms could provide more accurate mapping of the large arguments to the suitable range of sine calcualtion.

Highlighted
##

For example the value of Pi which is transcendental number with infinite precision and it could benefit from the wider fp registers so range-reduction algorithms could provide more accurate mapping of the large arguments to the suitable range of sine calcualtion.This range reduction has been a subject of extensive research, and practical solutions have been implemented which don't rely on extra hardware precision. Anyway, to justify the investment in a higher precision, among other things a corresponding math function library is required, requiring yet again a higher precision algorithm for range reduction.

You could find plenty of references on the limitations of simply relying on extra precision for range reduction, as the x87 firmware does.

TimP

Black Belt

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

06-28-2012
04:32 AM

191 Views

Quoting iliyapolak

Sometimes it could be useful.When youdeal with the speed of execution vs precision and you do not want the arbitrary precision implementation which is slower than hardware registers.Could you explain why do you need a 128-bit precision in that case

For example the value of Pi which is transcendental number with infinite precision and it could benefit from the wider fp registers so range-reduction algorithms could provide more accurate mapping of the large arguments to the suitable range of sine calcualtion.

You could find plenty of references on the limitations of simply relying on extra precision for range reduction, as the x87 firmware does.

Highlighted
##

So you think that there is no need for quad-precision fphardware registers to speed upincreased precision calculation.Arbitrary precision(to some point) could also benefit albeit partly from increased precision hardware registers.I think thatit won't beat the memory array based arbitrary precision in the terms of precision needed to represent some numbers very accurately,but in some cases it would speed up the calculation.

Bernard

Black Belt

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

06-28-2012
05:10 AM

191 Views

Highlighted
##

I didn't say there is no need for quad precision. All widely used Fortran compilers have it, for example, with software implementation. Performance deficiency of current quad precision is due as much to lack of vectorizability as lack of single hardware instruction implementation.

My point was that no matter how much hardware precision you have, you still need a higher precision range reduction algorithm to support trig functions on your new high precision.

If the market demand were seen, no doubt someone would study the feasibility of vector quad precision on future 256- and 512-bit register platforms.

TimP

Black Belt

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

06-28-2012
06:27 AM

191 Views

My point was that no matter how much hardware precision you have, you still need a higher precision range reduction algorithm to support trig functions on your new high precision.

If the market demand were seen, no doubt someone would study the feasibility of vector quad precision on future 256- and 512-bit register platforms.

Highlighted
##

yuriisig

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

06-28-2012
06:32 AM

191 Views

The fastest algorithms use IEEE 754.

Highlighted
##

**>>you still need a higher precision range reduction algorithm to support trig functions on your new high precision**

It is catch-22 situation.

Bernard

Black Belt

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

06-28-2012
07:24 AM

191 Views

I agree with you on this.We must also ask for what purpose should the hardware and ISA be modified to implement quadprecision or even more.I suppose that thereare not many mainstream math or engineeringapplicationthat need to calculatequad precisiontranscend. functions values.And for those esoteric application or highly sofisticated math packages(Mathematica ,Matlab)which calculates trig function with arbitrary precisionthe memory array model will be the best implementation albeit at theprice of speed of execution.I didn't say there is no need for quad precision. All widely used Fortran compilers have it, for example, with software implementation. Performance deficiency of current quad precision is due as much to lack of vectorizability as lack of single hardware instruction

Highlighted
##

*I didn't say there is no need for quad precision...*

Borland C++ compiler v5.xincludes a**BCD Number Library** and it allows to work with numbers up to 5,000 digits. A question is:

ShouldI wait for a hardware support of 256-bit or 512-bit precisionsif some workaround could be used?

Also, having workedin financial industry for many years I could say thataccuracy of calculations ismore important than speed.

SKost

Valued Contributor II

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

06-28-2012
05:10 PM

191 Views

Quoting TimP (Intel)

Borland C++ compiler v5.xincludes a

ShouldI wait for a hardware support of 256-bit or 512-bit precisionsif some workaround could be used?

Also, having workedin financial industry for many years I could say thataccuracy of calculations ismore important than speed.

Highlighted
##

The question is what kind of applications beside some esoteric pure mathematical soft which calculates Pi untill thousands of digits and sophisticated math packages like Mathematica needs such a precision.

Bernard

Black Belt

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

06-28-2012
09:06 PM

191 Views

Java also has two arbitrary precision classes: Big Integer and Big Decimal.But it is unintuitive to work with these classes because numerical primitives like float or int are represented by objects and so simple arithmketic operations are done on objects so you have a large overhead of memory space needed to store them and time when you are doing calculation is very slow even hundreds times slower than in the case of arithmetics done on primitive types.Borland C++ compiler v5.xincludes a

BCD Number Libraryand it allows to work with numbers up to 5,000 digits.

The question is what kind of applications beside some esoteric pure mathematical soft which calculates Pi untill thousands of digits and sophisticated math packages like Mathematica needs such a precision.

Highlighted
##

The gnu multiple precision libraries gmp mpc mpfr are used in the gnu compilers. These libraries presumably are more efficient than high precision decimal libraries, such as gnu libbid, which are favored in many monetary applications. The libbid was designed to support compilation to target either a firmware or the software library decimal implementation, including the decimal mode support for C. These libraries are sufficiently important that they get some consideration in CPU hardware design and are unlikely to be replaced by complete firmware/hardware implementations.

TimP

Black Belt

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

06-29-2012
05:14 AM

191 Views

Highlighted
##

If speed really matters you should not program in Java.

In e.g. C++ you have structs/classes without necessarily needing heap space, and you have operator overloading. Also, some C(++) compilers (e.g. gcc: __int128) allow for 128 bit integers. Intel's C compiler also knows about some kind of 128 bit floats which are emulated quite efficiently.

sirrida

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

06-29-2012
05:59 AM

191 Views

In e.g. C++ you have structs/classes without necessarily needing heap space, and you have operator overloading. Also, some C(++) compilers (e.g. gcc: __int128) allow for 128 bit integers. Intel's C compiler also knows about some kind of 128 bit floats which are emulated quite efficiently.

Highlighted
##

Now I am porting my special functions library from Java to C++.As my tests have shown when fully optimized by Intel compiler native code is two or even three times faster than the same code written in java.In my previous post I wrote about the arbitrary precision Java classes and the problems of using an objects to perform simple arithmetic operations on arbitrary precision numbers.I did not test and see the C++ implementation of arbitrary precision classes but when it is based on objects to represent a primitive types and uses counterintuitive approach to perform simple arithmetics on objects it will be also very slow maybe not so slow like a java but not fast like hardware based arbitrary precision.

Bernard

Black Belt

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

06-29-2012
06:51 AM

191 Views

Highlighted
##

Thanks all for your comments so far. Been away for a bit so let me try to answer all comments and questions to add contextin one bigpost:

akirkeby

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

07-17-2012
08:24 AM

191 Views

The business problem is that 14-15 significant digits is not enough to retain sufficient precision for amounts of moeny in bookkeeping applications where large transaction volumes are totaled up on a regular basis. The problem is particularly apparent when adding low unit value currencies such as VND or IDR.

The numbers in playare not typically integers, although for simple summation they could be shifted a few digits but this would only solve some of the applications and thus adds to the overall complexity.

Our current solution under investigation is a software implementation of a 128-bit decimal type based on IEEE 754-2008. The performance so far is 1-2 orders of maginitude slower than the corresponding 64-bit data types currently used.

Since our software is deployed in Windows environments the only alternative to a software implementation I currently see is the FPGA route. But that's not particularly attractive as FPGA hardware it would have to be installed in bulk on servers is outsourced data centres at substantial cost.

I'm aware that asking for 128-bit precision support at CPU level is a request for the long-term. However, with the current performance penalty we see from the software implementation it is clear that while it may work in limited areas for a whileit will never be something we or our clients will be happy with.

Thanks

For more complete information about compiler optimizations, see our Optimization Notice.