My guess would be it uses a

drMikeT · ‎04-12-2013

Hello,

I was wondering if there are any bandwidth and latency figures for transferring data from host DRAM to Phi GDRAM and vice versa.

Is the PCIe gen2 interface that is currently used by Phi operating at its full rated bandwidth when data flows on either direction ? PCIe gen2 can send and receive simultaneously 8GB/s+8GB/s. How much of this ideal figures can be actually attained when we transfer data from host DRAM to Phi GDRAM and vice versa?

When would a PCIe gen3 Phi come out? I am only curious why it just came with PCIe gen 2 only, even though the high-end hosts (Sandy- Ivy-Bridge) nativaly support PCIe gen3.

thanks

Michael

Sumedh_N_Intel · ‎04-16-2013

Hi Michael,

The Intel Xeon Phi coprocessor PCIe capabilities are as below:

SE10 (61 core, 1.093GHz, 5100P (60 cores, 1.053GHz,

8GB, 5.5GT/s) 8GB, 5.0GT/s)

Host to device 6.88 GB/s 6.91 GB/s

Device to Host 6.98 GB/s 6.95 GB/s

You are correct in understanding that currently the coprocessor supports only PCIe gen2 and does not support gen3. I don't quite know the reason for this decision. May be the experts can answer to that.

drMikeT · ‎04-16-2013

Hi Sumedh, thanks for the performance numbers.

How did you measure them?

If I use #pragma offload directives in code can I obtain similar performance?

How is performance using MPI messaging from host to mic and from mic to host?

Thanks ...

michael

Charles_C_Intel1 · ‎04-16-2013

Those look like maximum numbers, possibly only attainable using SCIF? Offoad will be, by necessity, a bit slower. MPI, depending on the flavor you choose, can be use something as slow as TCP/IP on up to faster solutions.

Sumedh_N_Intel · ‎04-16-2013

I am not quite sure of how these performance number were obtained. I have seen similar peak performance numbers using the offload pragma. I agree with Charles, the numbers with MPI can vary depending on the flavor you choose.

You could try out an alpha version of the SHOC benchmark suite, which contains the PCIe Download and PCIe Readback workloads. You can find more about the benchmark suite at : http://software.intel.com/en-us/blogs/2013/03/20/the-scalable-heterogeneous-computing-benchmark-suite-shoc-for-intelr-xeon-phitm

drMikeT · ‎04-16-2013

Sumedh, SHOC on Phi is actually a great benchmark to utilize. I will follow the link and see how it goes.

Thanks again

Michael

drMikeT · ‎04-16-2013

Charles Congdon (Intel) wrote:

Those look like maximum numbers, possibly only attainable using SCIF? Offoad will be, by necessity, a bit slower. MPI, depending on the flavor you choose, can be use something as slow as TCP/IP on up to faster solutions.

I think PCIe gen2 maximum throughput numbers should be : 8 GB/s = 5 GT/s X 2 X (4/5) and for the ammended PCIe gen2 : 8.8 GB/s = 5.5 GT/s X 2 X (4/5)

I sure hope the cited numbers can be attained from user code.

Michael

DubitoCogito · ‎04-16-2013

My guess would be it uses a rev 2.x interface because it is really a rebranded Larrabee part which was under development before rev 3.0 was finalized. Of course, there could also be other reasons.

drMikeT · ‎04-17-2013

DubitoCogito wrote:

My guess would be it uses a rev 2.x interface because it is really a rebranded Larrabee part which was under development before rev 3.0 was finalized. Of course, there could also be other reasons.

Given the fact that a PCIgen3 is almost 2X as fast as gen 2 and it is supported on Sandy/Ivy bridge EP/EN, makes me wonder why this obviously good connection link to Phi was ignored

PCIe gen3 ideal : x8 : 7.87 GB/s; x16 : 15.7375 GB/s

Bandwidth and Latency figures to and from Host DRAM