questions on limitations

vibrantcascade · ‎02-09-2013

What is the Phi optimized for as far as calculations go? Single or double precision? (I see floats mentioned in the literature and how every core has 2 vector units, but it never mentions if they're 32 or 64 bit units.)

Is it capable of doing quad precision math?

What kind of peak performance can I expect when running in double precision and quad precision?

TimP · ‎02-10-2013

As the advertising indicates, Intel(c) Xeon Phi(tm) Vector Processing Units are directly analogous to the progression from SSE2 to AVX on the main CPUs; both single and double precision supported, with higher performance for single precision, as measured in floating point operations. If the advertising claims don't answer your question about peak performance, look up the Top500 quotations or even the prominent news posts such as http://www.theregister.co.uk/2012/11/12/intel_xeon_phi_coprocessor_launch/

vibrantcascade · ‎02-10-2013

So with the Phi being basically a bunch of x86 CPUs, if I have fortran code that I compiled with intel fortran composer for linux 2013 which involves quad precision numbers, will the Phi be able to handle the calculations or will the quad precision force the code to execute on the host system? (The biggest limitation for GPU systems in scientific computing is the lack of quad precision math currently, and I was wondering if the Phi is the same.)

TimP · ‎02-10-2013

You could try the quad precision implementation of ifort, but it doesn't take advantage of vectorization. If your application is able to take advantage of threading, at least 60, preferably 118 or more threads, this may be interesting. Similarly, open source multiple precision libraries might be interesting, if it's possible to thread the application itself, e.g. by OpenMP, Cilk+, or TBB.

vibrantcascade · ‎02-10-2013

I have an OpenMP version of my code running in ifort with quad precision which I currently use on 2 - 16 core systems which is fully scalable with 2-layer nested OpenMP loops to keep the processors at 99% usage at all times. Due to the way I nest my openMP I could easily support 100,000+ independent threads if I had a system capable of handeling it. (i designed it this way for GPU processing, but some problems require quad precision)

So it sounds like you think the Phi can support quad precision but you're not entirely sure?

Is there any sort of online test resource I could use to give it a try and see how beneficial the Phi would be to me with a test that completes in 5 minutes or less on a single quad core? I don't think terragrid has any clusters with the Phi yet.