- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I was wondering if there are any bandwidth and latency figures for transferring data from host DRAM to Phi GDRAM and vice versa.
Is the PCIe gen2 interface that is currently used by Phi operating at its full rated bandwidth when data flows on either direction ? PCIe gen2 can send and receive simultaneously 8GB/s+8GB/s. How much of this ideal figures can be actually attained when we transfer data from host DRAM to Phi GDRAM and vice versa?
When would a PCIe gen3 Phi come out? I am only curious why it just came with PCIe gen 2 only, even though the high-end hosts (Sandy- Ivy-Bridge) nativaly support PCIe gen3.
thanks
Michael
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Michael,
The Intel Xeon Phi coprocessor PCIe capabilities are as below:
SE10 (61 core, 1.093GHz, 5100P (60 cores, 1.053GHz,
8GB, 5.5GT/s) 8GB, 5.0GT/s)
Host to device 6.88 GB/s 6.91 GB/s
Device to Host 6.98 GB/s 6.95 GB/s
You are correct in understanding that currently the coprocessor supports only PCIe gen2 and does not support gen3. I don't quite know the reason for this decision. May be the experts can answer to that.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Sumedh, thanks for the performance numbers.
How did you measure them?
If I use #pragma offload directives in code can I obtain similar performance?
How is performance using MPI messaging from host to mic and from mic to host?
Thanks ...
michael
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Those look like maximum numbers, possibly only attainable using SCIF? Offoad will be, by necessity, a bit slower. MPI, depending on the flavor you choose, can be use something as slow as TCP/IP on up to faster solutions.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am not quite sure of how these performance number were obtained. I have seen similar peak performance numbers using the offload pragma. I agree with Charles, the numbers with MPI can vary depending on the flavor you choose.
You could try out an alpha version of the SHOC benchmark suite, which contains the PCIe Download and PCIe Readback workloads. You can find more about the benchmark suite at : http://software.intel.com/en-us/blogs/2013/03/20/the-scalable-heterogeneous-computing-benchmark-suite-shoc-for-intelr-xeon-phitm
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sumedh, SHOC on Phi is actually a great benchmark to utilize. I will follow the link and see how it goes.
Thanks again
Michael
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Charles Congdon (Intel) wrote:
Those look like maximum numbers, possibly only attainable using SCIF? Offoad will be, by necessity, a bit slower. MPI, depending on the flavor you choose, can be use something as slow as TCP/IP on up to faster solutions.
I think PCIe gen2 maximum throughput numbers should be : 8 GB/s = 5 GT/s X 2 X (4/5) and for the ammended PCIe gen2 : 8.8 GB/s = 5.5 GT/s X 2 X (4/5)
I sure hope the cited numbers can be attained from user code.
Michael
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
My guess would be it uses a rev 2.x interface because it is really a rebranded Larrabee part which was under development before rev 3.0 was finalized. Of course, there could also be other reasons.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
DubitoCogito wrote:
My guess would be it uses a rev 2.x interface because it is really a rebranded Larrabee part which was under development before rev 3.0 was finalized. Of course, there could also be other reasons.
Given the fact that a PCIgen3 is almost 2X as fast as gen 2 and it is supported on Sandy/Ivy bridge EP/EN, makes me wonder why this obviously good connection link to Phi was ignored
PCIe gen3 ideal : x8 : 7.87 GB/s; x16 : 15.7375 GB/s
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page