Quad precision would be implemented on Xeonby combinations of x87 "REAL*10" operations, so at least 2 non-vectorizable instructions would be required to implement each floating point operation. For most operations, you should get 48 bits additional beyond the x87 precision, thusabout 33 decimal.
Comparing linpack performance doesn't make much sense, except to emphasize that you require roughly 5 operations per floating point add and multiply, plus packing and unpacking time, as well as losing a factor of say 2 by no vectorization.
I haven't seen any documentation indicatingthat Power6 would have changed the floating point format from that which previous IBM and MIPS architectures used, which supports approximately 107 bits or 31 decimal, with exponent range reduced in comparison with REAL*8. In effect, 11 bits are wasted, due to carrying 2 copies of the exponent, differing by a constant. Of course, those implementations should penalize performance by only a factor of 3 or so, compared with non-vector REAL*8.
One of the design parameters for Itanium is full instruction level support for quad precision, possibly making it a superior platform for that purpose. Needless to say, that advantage hasn't proven decisive in the marketplace.