Cluster xeon phi + xeon +Gpu tesla

Enrique_H_ · ‎01-07-2016

Muy buenas soy de Peru ,mi duda es la siguiente en el area de investigacion que estamos trabajando ahora vamos a armar un cluster de 6 NODOS en la cual queremos comprar su productos , en estos meses cuando halla el primer desembolso MI PREGUNTA ES LA SIGUIENTE , se puede hacer un cluster utilizando Intel XEON , XEON PHI y GPU tesla? para cada NODO , espero su pronta respuestas saludos desde Peru y estaremos en contacto pronto para adquirir sus productos

JJK · ‎01-08-2016

I'm not sure how many Spanish speaking people respond on this list, so here's google translate:

Very good I am from Peru, my question is this in the research area we are working now going to build a cluster of six nodes which want to buy your products, in these months when it finds the first disbursement My question is this, you can make a cluster using Intel Xeon, Xeon and Tesla GPU PHI? For each node, I hope your answers greetings from Peru and soon will be in touch soon to acquire its products

the short answer to your question is: yes, this is possible. I have a server with a Xeon Phi, Tesla, GTX580 and AMD Radeon card in it all at once. The software drivers are tricky to get right (mostly due to the AMD driver software) but the system is stable and works like a charm. The biggest problem will be the power consumption in the cluster nodes. My multi-GPU box is a Tyan server with mega power supply. YMMV.

My question is: why would you want to mix a Phi and a Tesla in one box?

Jose_F_1 · ‎01-11-2016

Hi, I am colleague of Enrique and interested to continue this topic of hybrid arquitecture.

Well, we are building a small cluster and will buy 6 Xeon-phi+6 Xeon+6 Tesla. We want to run NBody simulations on all co-processors by using MPI, and on all GPUs by using CUDA. So far, we tried this on GPU only. We want to find out how efficient would be to use Xeon-phi + GPUs at the same time with our hybrid code, compared to only GPU.

When we check all best supercomputers, we notice they use one or the other arquitectures, but not both together. Is it true? Might be because they are not more efficient together? Is it possible at all to run Xeon-phi+GPU simultaneously?

Thanks for any comment!

Jose

JJK · ‎01-12-2016

is it possible? yes

is it efficient? no

any type of calculation that is done on a GPU or on a Xeon Phi needs to be transferred from host memory to the GPU or Phi. This usually is the bottleneck when doing many-GPU calculations. Transferring data between a GPU and a Xeon Phi is not very well optimized (like it is on some systems between two CUDA GPUs) and will require host CPU involvement - this will likely undo any speed gain you achieve by spreading out your code over GPUs and Phis.

Jose_F_1 · ‎01-12-2016

Jo

Hi,

Thanks for the comment!

What you are saying is that distributed parallelism (GPUs or MICs or GPU+MICs) is not efficient? It is clear that communication is the bottleneck in all distributed systems. But great performance can be achieved anyway. By using MPI for Intel processors and CUDA for GPUs, my main regard was related to, what is more efficient:

- 6 Xeon processors + 6 Nvidia Tesla GPUs working in parallel

- 6 Xeon processors + 6 Xeon-phi co-processors working in parallel

- 6 Xeon processors + 6 Nvidia Tesla GPUs + 6 Xeon-phi co-processors working all in parallel

Intuitively, the third should be more efficient by same load, while all three suffer on same communication problems (true, the last one has more communication, but let say we use efficient enough algorithms)

Would you agree with that?

JJK · ‎01-13-2016

in theory, you are correct: with just the right algorithm and just the right workflow a system with 6 CPUs 6 GPUs and 6 Phis will be the fastest. The tricky part is "just the right algorithm and just the right workflow": it all depends on your (current) code and your coding skills.

My advice would still be to NOT attempt this - it simply is not worth the hassle to get good performance out of a system like this. I'd invest more in better CPUs and/or better or more GPUs, and more RAM instead of mixing GPUs with Phis - unless your existing code already is tuned for such a setup.

Jose_F_1 · ‎01-13-2016

Thanks for the valuable advice. Actually we have a good NBody code for GPUs or MICs. But we are writing a new one for another project and need to decide about the algorithm. We are buying the 6 GPUs and 6 MICs anyway, so we need to decide the best way to use them. Probably alternating their use, we need to evaluate this.

May be you know about one case of what you said (link or literature)? That can prove that both arquitectures together are not very efficient?

Best

JJK · ‎01-14-2016

I'm not aware of anyone that has attempted to spread code across GPUs and MICs simultaneously - but from the way both architectures are designed I would expect it to not work very efficiently - it also depends on how much interaction is needed between the threads for either the MIC algorithm or the GPU algorithm: going from GPU->MIC and vice versa is going to be your biggest bottleneck.