If no,canQP be emulatedon XMM and YMM registers with small overhead (< 2X slowdown)compared to the double precision FP arithmetic?
are there any plans to incorporate native hardware-based support for quad precision, at least into Xeon processors? The performance ofthe pure software implementation is generally too slow for our purposes - mostly large regular financial summation tasks. If not, is there a forum of sorts where one can register interest in hardware support for quad precision in Xeon processors?
As this would be a long term project (years), I hope you are working with the current implementations of parallelism.
If by quad-precision you have 128-bit Binary Integer Decimal in mind or as candidate for consideration (BID encoding can deal with rounding and precision propagation issues better than binary FP encoding, Intel's DFP library is a great place to start.
You might want to contact the leader of Intel DFP library (he may be able to brief you of future release plan for that library.
You can also contact me offline to explore potential performance headroom on second andthird generation intel core processorsor Intel Xeon E3 and E5 processors.
Could you explain why do you need a 128-bit precision in that case?
Rounding problemscreatereal troubles in case of exchange operations and it would beinteresting to understand
what your problem is.
Sometimes it could be useful.When youdeal with the speed of execution vs precision and you do not want the arbitrary precision implementation which is slower than hardware registers.
Could you explain why do you need a 128-bit precision in that case
For example the value of Pi which is transcendental number with infinite precision and it could benefit from the wider fp registers so range-reduction algorithms could provide more accurate mapping of the large arguments to the suitable range of sine calcualtion.
You could find plenty of references on the limitations of simply relying on extra precision for range reduction, as the x87 firmware does.
My point was that no matter how much hardware precision you have, you still need a higher precision range reduction algorithm to support trig functions on your new high precision.
If the market demand were seen, no doubt someone would study the feasibility of vector quad precision on future 256- and 512-bit register platforms.
I agree with you on this.We must also ask for what purpose should the hardware and ISA be modified to implement quadprecision or even more.I suppose that thereare not many mainstream math or engineeringapplicationthat need to calculatequad precisiontranscend. functions values.And for those esoteric application or highly sofisticated math packages(Mathematica ,Matlab)which calculates trig function with arbitrary precisionthe memory array model will be the best implementation albeit at theprice of speed of execution.
I didn't say there is no need for quad precision. All widely used Fortran compilers have it, for example, with software implementation. Performance deficiency of current quad precision is due as much to lack of vectorizability as lack of single hardware instruction
>>you still need a higher precision range reduction algorithm to support trig functions on your new high precision
It is catch-22 situation.
Borland C++ compiler v5.xincludes a BCD Number Library and it allows to work with numbers up to 5,000 digits. A question is:
ShouldI wait for a hardware support of 256-bit or 512-bit precisionsif some workaround could be used?
Also, having workedin financial industry for many years I could say thataccuracy of calculations ismore important than speed.
Java also has two arbitrary precision classes: Big Integer and Big Decimal.But it is unintuitive to work with these classes because numerical primitives like float or int are represented by objects and so simple arithmketic operations are done on objects so you have a large overhead of memory space needed to store them and time when you are doing calculation is very slow even hundreds times slower than in the case of arithmetics done on primitive types.
Borland C++ compiler v5.xincludes a BCD Number Library and it allows to work with numbers up to 5,000 digits.
The question is what kind of applications beside some esoteric pure mathematical soft which calculates Pi untill thousands of digits and sophisticated math packages like Mathematica needs such a precision.
In e.g. C++ you have structs/classes without necessarily needing heap space, and you have operator overloading. Also, some C(++) compilers (e.g. gcc: __int128) allow for 128 bit integers. Intel's C compiler also knows about some kind of 128 bit floats which are emulated quite efficiently.
The business problem is that 14-15 significant digits is not enough to retain sufficient precision for amounts of moeny in bookkeeping applications where large transaction volumes are totaled up on a regular basis. The problem is particularly apparent when adding low unit value currencies such as VND or IDR.
The numbers in playare not typically integers, although for simple summation they could be shifted a few digits but this would only solve some of the applications and thus adds to the overall complexity.
Our current solution under investigation is a software implementation of a 128-bit decimal type based on IEEE 754-2008. The performance so far is 1-2 orders of maginitude slower than the corresponding 64-bit data types currently used.
Since our software is deployed in Windows environments the only alternative to a software implementation I currently see is the FPGA route. But that's not particularly attractive as FPGA hardware it would have to be installed in bulk on servers is outsourced data centres at substantial cost.
I'm aware that asking for 128-bit precision support at CPU level is a request for the long-term. However, with the current performance penalty we see from the software implementation it is clear that while it may work in limited areas for a whileit will never be something we or our clients will be happy with.