Software Archive
Read-only legacy content
공지
FPGA community forums and blogs have moved to the Altera Community. Existing Intel Community members can sign in with their current credentials.
17060 토론

Cluster xeon phi + xeon +Gpu tesla

Enrique_H_
초급자
1,271 조회수

Muy buenas soy de Peru ,mi duda es la siguiente en el area de investigacion que estamos trabajando ahora vamos a armar un cluster de 6 NODOS en la cual queremos comprar su productos , en estos meses cuando halla el primer desembolso  MI PREGUNTA ES LA SIGUIENTE , se puede hacer un cluster utilizando Intel   XEON , XEON PHI y GPU tesla?  para cada NODO , espero su pronta respuestas saludos desde Peru y estaremos en contacto pronto para adquirir sus productos  

0 포인트
7 응답
JJK
새로운 기여자 III
1,271 조회수

I'm not sure how many Spanish speaking people respond on this list, so here's google translate:

Very good I am from Peru, my question is this in the research area we are working now going to build a cluster of six nodes which want to buy your products, in these months when it finds the first disbursement My question is this, you can make a cluster using Intel Xeon, Xeon and Tesla GPU PHI? For each node, I hope your answers greetings from Peru and soon will be in touch soon to acquire its products

 

the short answer to your question is: yes, this is possible. I have a server with a Xeon Phi, Tesla, GTX580 and AMD Radeon card in it all at once. The software drivers are tricky to get right (mostly due to the AMD driver software) but the system is stable and works like a charm. The biggest problem will be the power consumption in the cluster nodes. My multi-GPU box is a Tyan server with mega power supply. YMMV.

My question is: why would you want to mix a Phi and a Tesla in one box?

 

0 포인트
Jose_F_1
초급자
1,271 조회수

Hi, I am colleague of Enrique and interested to continue this topic of hybrid arquitecture.

Well, we are building a small cluster and will buy 6 Xeon-phi+6 Xeon+6 Tesla. We want to run NBody simulations on all co-processors by using MPI, and on all GPUs by using CUDA. So far, we tried this on GPU only. We want to find out how efficient would be to use Xeon-phi + GPUs at the same time with our hybrid code, compared to only GPU.

When we check all best supercomputers, we notice they use one or the other arquitectures, but not both together. Is it true? Might be because they are not more efficient together? Is it possible at all to run Xeon-phi+GPU simultaneously?

Thanks for any comment! 

Jose

 

 

0 포인트
JJK
새로운 기여자 III
1,271 조회수

is it possible? yes

is it efficient? no

any type of calculation that is done on a GPU or on a Xeon Phi needs to be transferred from host memory to the GPU or Phi. This usually is the bottleneck when doing many-GPU calculations. Transferring data between a GPU and a Xeon Phi is not very well optimized (like it is on some systems between two CUDA GPUs) and will require host CPU involvement - this will likely undo any speed gain you achieve by spreading out your code over GPUs and Phis.

 

 

0 포인트
Jose_F_1
초급자
1,271 조회수

Jo

 

Hi,

Thanks for the comment!

What you are saying is that distributed parallelism (GPUs or MICs or GPU+MICs) is not efficient? It is clear that communication is the bottleneck in all distributed systems. But great performance can be achieved anyway. By using MPI for Intel processors and CUDA for GPUs, my main regard was related to, what is more efficient:

- 6 Xeon processors + 6 Nvidia Tesla GPUs working in parallel

- 6 Xeon processors + 6 Xeon-phi co-processors working in parallel

- 6 Xeon processors + 6 Nvidia Tesla GPUs + 6 Xeon-phi co-processors working all in parallel

Intuitively, the third should be more efficient by same load, while all three suffer on same communication problems (true, the last one has more communication, but let say we use efficient enough algorithms)

Would you agree with that?

 

 

 

 

0 포인트
JJK
새로운 기여자 III
1,271 조회수

in theory, you are correct: with just the right algorithm and just the right workflow a system with 6 CPUs 6 GPUs and 6 Phis will be the fastest. The tricky part is "just the right algorithm and just the right workflow": it all depends on your (current) code and your coding skills.

My advice would still be to NOT attempt this - it simply is not worth the hassle to get good performance out of a system like this. I'd invest more in better CPUs and/or better or more GPUs, and more RAM instead of mixing GPUs with Phis - unless your existing code already is tuned for such a setup.

 

0 포인트
Jose_F_1
초급자
1,271 조회수

Thanks for the valuable advice. Actually we have a good NBody code for GPUs or MICs. But we are writing a new one for another project and need to decide about the algorithm. We are buying the 6 GPUs and 6 MICs anyway, so we need to decide the best way to use them. Probably alternating their use, we need to evaluate this.

May be you know about one case of what you said (link or literature)? That can prove that both arquitectures together are not very efficient?

Best

0 포인트
JJK
새로운 기여자 III
1,271 조회수

I'm not aware of anyone that has attempted to spread code across GPUs and MICs simultaneously - but from the way both architectures are designed I would expect it to not work very efficiently - it also depends on how much interaction is needed between the threads for either the MIC algorithm or the GPU algorithm: going from GPU->MIC and vice versa is going to be your biggest bottleneck.

 

0 포인트
응답