Software Archive
Read-only legacy content
17061 Discussions

Bandwidth and Latency figures to and from Host DRAM

drMikeT
New Contributor I
813 Views

Hello,

I was wondering if there are any bandwidth and latency figures for transferring data from host DRAM to Phi GDRAM and vice versa.

Is the PCIe gen2 interface that is currently used by Phi operating at its full rated bandwidth when data flows on either direction ? PCIe gen2 can send and receive simultaneously 8GB/s+8GB/s. How much of this ideal figures can be actually attained when we transfer data from host DRAM to Phi GDRAM and vice versa?

When would a PCIe gen3 Phi come out? I am only curious why it just came with PCIe gen 2 only, even though the high-end hosts (Sandy- Ivy-Bridge) nativaly support PCIe gen3.

thanks

Michael

0 Kudos
8 Replies
Sumedh_N_Intel
Employee
813 Views

Hi Michael, 

The Intel Xeon Phi coprocessor PCIe capabilities are as below:

                         SE10 (61 core, 1.093GHz,      5100P (60 cores, 1.053GHz,

                                   8GB, 5.5GT/s)                       8GB, 5.0GT/s)

Host to device               6.88 GB/s                             6.91 GB/s

Device to Host               6.98 GB/s                             6.95 GB/s

You are correct in understanding that currently the coprocessor supports only PCIe gen2 and does not support gen3. I don't quite know the reason for this decision. May be the experts can answer to that. 

0 Kudos
drMikeT
New Contributor I
813 Views

Hi Sumedh, thanks for the performance numbers.

How did you measure them?

If I use #pragma offload directives in code can I obtain similar performance?

How is performance using MPI messaging from host to mic and from mic to host?

Thanks ...

michael

0 Kudos
Charles_C_Intel1
Employee
813 Views

Those look like maximum numbers, possibly only attainable using SCIF?  Offoad will be, by necessity, a bit slower.  MPI, depending on the flavor you choose, can be use something as slow as TCP/IP on up to faster solutions.

0 Kudos
Sumedh_N_Intel
Employee
813 Views

I am not quite sure of how these performance number were obtained. I have seen similar peak performance numbers using the offload pragma. I agree with Charles, the numbers with MPI can vary depending on the flavor you choose.

You could try out an alpha version of the SHOC benchmark suite, which contains the PCIe Download and PCIe Readback workloads. You can find more about the benchmark suite at : http://software.intel.com/en-us/blogs/2013/03/20/the-scalable-heterogeneous-computing-benchmark-suite-shoc-for-intelr-xeon-phitm

0 Kudos
drMikeT
New Contributor I
813 Views

Sumedh, SHOC on Phi is actually a great benchmark to utilize. I will follow the link and see how it goes.

Thanks again

Michael

0 Kudos
drMikeT
New Contributor I
813 Views

Charles Congdon (Intel) wrote:

Those look like maximum numbers, possibly only attainable using SCIF?  Offoad will be, by necessity, a bit slower.  MPI, depending on the flavor you choose, can be use something as slow as TCP/IP on up to faster solutions.

I think PCIe gen2 maximum throughput numbers should be : 8 GB/s = 5 GT/s X 2 X (4/5) and for the ammended PCIe gen2 : 8.8 GB/s = 5.5 GT/s X 2 X (4/5)

I sure hope the cited numbers can be attained from user code.

Michael

0 Kudos
DubitoCogito
Novice
813 Views

My guess would be it uses a rev 2.x interface because it is really a rebranded Larrabee part which was under development before rev 3.0 was finalized. Of course, there could also be other reasons.

0 Kudos
drMikeT
New Contributor I
813 Views

DubitoCogito wrote:

My guess would be it uses a rev 2.x interface because it is really a rebranded Larrabee part which was under development before rev 3.0 was finalized. Of course, there could also be other reasons.

Given the fact that a PCIgen3 is almost 2X as fast as gen 2 and it is supported on Sandy/Ivy bridge EP/EN, makes me wonder why this obviously good connection link to Phi was ignored

PCIe gen3  ideal : x8 : 7.87 GB/s; x16 : 15.7375 GB/s

0 Kudos
Reply