GPU programming

YertleTheTurtle · ‎09-15-2011

It seems that it's been over a year since someone asked a question along the following lines, so let me try again:

Now that I've got my program running under OpenMP, I'd like to try programming the GPU.

It seems that only Portland Group provides a Fortran compiler for this purpose; they use CUDA.

Intel - do you have any plans to provide any GPU programming capability using the Intel compiler?

If so, what is the projected release date?

Steven_L_Intel1 · ‎09-16-2011

Read up on Intel Many Integrated Core (MIC). This is our solution to "GPU programming" and it will be available for use from the Intel C++ and Fortran compilers. You might also look at press coverage of this week's Intel Developer Forum (IDF) for presentations and demonstrations of MIC.

YertleTheTurtle · ‎09-16-2011

Thanks, but if I understood properly, MIC is no help to me - few of my customers will be using Xeon-based supercomputers and they aren't likely to go out and buy one either.

But they all have either an AMD or Nvidia GPU.

So, do I understand correctly that Intel's plan for GPU programming for the foreseeable future is directed at the hardware realm for specialized applications, rather than a general purpose, marketplace approach?

Steven_L_Intel1 · ‎09-16-2011

That's what's been announced so far.

Andrew_Smith · ‎09-17-2011

But if a MIC card for your PC was in a similar price range to a graphics card you wouldn't need to be into too much software specialization to justify using one. Any hints here Intel?

TimP · ‎09-17-2011

more publicity was broadcast last week. Product announcement remains more than a year in the future.

JohnNichols · ‎09-17-2011

There was an announcement recently that the Chinese now had the world's fastest computer, but someone chimed in that it was all GPU's and no really useful programs could be written on it.

Chinese I think just want to beat the USA no matter what.

JMN

Steven_L_Intel1 · ‎09-17-2011

That Chinese computer is a mixture of Intel Xeon CPUs and NVidia GPUs. Today, that seems the best approach, but most everyone I talk to says that programming for CUDA is very difficult and time-consuming. This is what Intel is trying to address with MIC.

jimdempseyatthecove · ‎09-19-2011

I've explored using GPGPU's integrated into a FORTRAN simulation program consisting of 13 projects, 750 files, .gt. 600,000 lines of code. At the time I did the investigation (2008) I was using an ATI FireStream with RV670 chipset. The GPGPU programming was done using Brook+ as opposed to CUDA. This was before nVidia Tesla and my choice for FireStream was principally due to it supporting double precision FP (64-bits). The problems I had was this required writing two sets of code: One in FORTRAN for the host (4-core Q6600) and one in C/C++/Brook+ and then link it together with a threading toolkit which I wrote expressly for my simulation program.

The threading toolkit (QuickThread, www.quickthreadprogramming.com)has evolved some since 2008 and is principally targeted towards C++ applications run on multi-processor (socket), preferably NUMA architectures. The code for performing the heterogeneous schedulingis still in the threading toolkit but hasn't been tested since 2008. My experience at that time was that it was possible to write a single multi-threaded application whereby you could perform, say a matrix multiplication, where some of the rows and column could be performed multi-threaded on the host and other row/columns could be performed in parallelinside the GPGPU.

The test benchmarksproved that heterogeneous programming within a single multi-threaded applicationwas possible, however, for the technology available at that time, itwas not suitable for my simulation studies. The principal problems (for that technology) were:

a) double precision division used an invert then multiply (x/y == 1/y * x). While this may yield a performance boost, some precision is lost. The simulations I run cannot incur this loss in precision.

b) lack of good DP trig function libraries

c) The app (kernels) running in the GPGPU were not setup to share (virtual)memory with the app running in the host. Therefore this required block transfer of data into and out of the GPGPU. To mitigate this to some extent I created encapsulation types that knew where the current set of data for arrays was located to avoid unnecessary data copying and to identify thread scheduling opportunities.

Setting aside the loss in precision issue,the testing indicated (to me) doubling the cores and/or sockets with NUMA would yield better performance than a hybrid heterogeneous system built with 1P + GPGPU. Combining this with maintaining dual coding and it was a no brain-er to set aside the hybrid coding of my simulation program. I kept the bulk of the hybrid code structure inside the FORTRAN files in the anticipation that at some time in the future that Intel/AMD/nVidia/ATI would resolve this issue with an add-on/add-in solution. It appears now that the Intel MIC may be the best solution, or at least a better solution. Although I will have to say I do not know what AMD has up its sleeves.

At the Intel Developers Forum in San Francisco last week, the attendees had an opportunity to discuss road paths and run attendee written sample programs on a MIC. IOW verify for themselves that MIC is for real (at leastas a functionalprototype). The coding paradigms they exhibited were of two general types:

1) Using Cilk and/or Array Building Blocks (ARBB)running in a context similar to the CUDA/Brook+ whereby you have kernels that port into and run inside the MIC. *** the principal difference being the source code for the kernels are the same for the host. Compiled code is slightly different, principallydue to the wider AVX registers.

2) Using Putty or other terminal emulator running an application inside a "remote" system with a variation of Unix running inside. IOW similar to using Intel's Many-cores Testing Lab (MTL).

Although these are a good start for how to use the MIC, my preference is to tightly couple the MIC to the user application. Ideally I would like to get my hands on a MIC, and with the cooperation with engineers at Intel, work out the details of how to seamlessly integrate (multiple) multi-threaded apps to efficiently be distributed amongst the CPUs and MICs installed into an SMP system thus producing an SHMP (Symmetric / Heterogeneous Multi-Processor) system.

Jim Dempsey

JohnNichols · ‎09-23-2011

I was in China when the story of the Chinese Supercomputer broke. I heard one report that a fellow Computer Scientist in China made some remarks about the Chinese Supercomputer being less than user friendly.

The reply from the Chinese developer of the supercomputer was polite, somewhat bemused and gave a very friendly Chinese form of the Australian for "go soak your head in a bucket". Although an Australian would be somewhat more terse than that, but the Intel censor program would not like Aussie version.

If you have the fastest, biggest or meanest who cares about anything else.

JMN

Steven_L_Intel1 · ‎09-23-2011

Here's some interesting news on this general topic:

Dell, Intel rope Texas-sized 10 petaflopper

jimdempseyatthecove · ‎09-23-2011

Clay might be able to use the article to conjure up what a replacement system might look like for Blue Waters.

Jim Dempsey

JohnNichols · ‎09-23-2011

if I remeber correctly the PC Maximum Dream machine was a Xeon (double) motherboard.

Even the pC are getting mind blowlingly fast

JMN

durisinm · ‎11-09-2011

Isn't Intel afraid of ceding this area to nVIDIA and its CUDA tools for GPU programming? nVIDIA offers several GPU products and free programming tools, and the GPUs span the range from inexpensive, entry-level to big and beefy. I get the impression that many people and companies--such as PGI with its C and Fortran compilers with CUDA capability--have jumped on the CUDA bandwagon and are hard at work using it to solve their problems today. Isn't there the risk that many people wanting to use this technology won't wait for MIC if Intel's public product announcement for it is more than a year in the future?

Mike D.