Gen 8 and fp64

MSimm2 · ‎09-24-2014

In the document GVCS001-The Compute Architecture of Intel Processor Graphics Gen8.pdf
it states.

"Finally, one of the FPUs provides extended math capability to support high-throughput transcendental math functions and double precision 64-bit floating-point."

Does this mean its possible for intel gpu opencl to one day fully support cl_khr_fp64 :)
Does the FPU also do high-throughput double precision transcendental math functions?

Before someone mentions Xeon's and AVX, there is no reason Intel can not give both options and let the market decide.
Perhaps you could release a pro part (Xeon with igp) with both of the two FPU's supporting double precision...

Jeffrey_M_Intel1 · ‎09-30-2014

sorry for the delayed reply. I agree with you that fp64 would be very nice to have. At this point I don't have any details, but let me see if I can find any updates.

Robert_I_Intel · ‎10-08-2014

Our compute architects are having internal discussions about enabling cl_khr_fp64. If there are enough customer requests for this feature, we will do it.

allanmac1 · ‎10-08-2014

That would be pretty exciting as the latest discrete GPUs are reporting FP32:FP64 ratios of 32:1 (or worse).

MSimm2 · ‎10-08-2014

@Robert Ioffe, Thank you. Its great to hear its being looked at.

@alanmac I think Nvidia target their fp64 performance to match Intel CPU fp64 performance. ie.so that your not at a disadvantage vs the cpu when targeting cuda/opencl for you fp64 application. AMD doesn't do this, they try and give as much as they can without significantly compromising their general graphic (game) performance. My AMD 290X gives me 700Gflop (double precision). Even my old AMD HD 7770 was faster than my Haswell's (i7) AVX opencl (my app uses fp64 rsqrt).

For what its worth, this is my app, https://sourceforge.net/projects/openclsolarsyst/ it needs a lot of double precision grunt, I need double precision in order to have long term accuracy (and stability). All of the computation and graphics is running completely on the GPU. Its only copied back to CPU memory when close encounter detection is turned on (so that it can be dumped to a log).

MSimm2 · ‎10-09-2014

Something else to consider

From http://en.libreofficeforum.org/node/9119 in the section "Is there a minimum OpenCL version required by LibreOffice?"

"According to /core/sc/source/core/opencl/opencl_device.cxx the criteria appears based on the presence of these double floating-point (64bit) precision extensions:

Official Khronos cl_khr_fp64, which is OpenCL v1.0 compliant.
AMD subset cl_amd_fp64."

And from https://archive.fosdem.org/2014/schedule/event/calc_gpu_enabling_a_spreadsheet/attachments/slides/453/export/events/attachments/calc_gpu_enabling_a_spreadsheet/slides/453/libreoffice_gpu.pdf

"And Precision is nonnegotiable for spreadsheets IEE764 required"

I believe that in the future amd/intel "APU's" the igp section will grow in size much faster than the number of cpu cores and/or avx bit size