- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Do you have questions that you are not finding the answers for in our documentation? Need more training, source code examples, on what specifically? Help us understand what's missing so that we can make sure we develop documentation you care about (what is important, and what is nice to have)! Thank you
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Joe wrote:
Hi, I would find it useful to see some examples that combine MPI processes with openMP (or similar) thread generation (and in Fortran). On HPC clusters I've iused in the past, I just matched # of cores to # of processes, but it appears the Phi benifits from running fewer processes with many threads. Perhaps some of these examples are out there already, as I've only just started looking.
Thanks!
Depending on what you have in mind, this may be an interesting topic, but one on which there isn't much interest in attempting to make committee decisions about documentation.
I'm not sure how examples would help you on this. Assuming that your application scales reasonably well both under OpenMP and under MPI, it's mainly a question of trying the combinations. As you're likely to be running in what is somewhat illogically termed "symmetric" mode ranks on both host and coprocessor), you need to balance the work so that MPI barriers are reached about the same time by all ranks.
As far as the MIC side of this is concerned, you are limited by the increased memory consumption of additional MPI ranks, and the way in which sharing of VPU resources is done by alternating among threads on each core, making it unlikely that you want want more than 1 rank on a core. Useful applications, unlike simple examples, will run out of RAM, depending on your coprocessor model, with a lot fewer than 1 rank per core. On the other hand, real OpenMP applications, if there is any use of private arrays, will become starved for stack even with stack set to "unlimited," which some Intel experts advise against doing, so the number of useful threads per rank is limited. Of course, simple OpenMP examples will scale to at least 116 threads with the normal "unlimited" stack.
I have worked with applications which showed a pronounced performance peak (running on MIC alone) at 6 ranks of 30 threads (setting KMP_AFFINITY=balanced so as to spread the threads evenly across the cores assigned to each rank). If your application runs best on host with 1 rank per core, this may pose a problem in that you already have an excessive number of cores passing messages between host and coprocessor even before you engage multiple nodes, and, for MIC to be useful, the performance of a rank with 30 threads ought to exceed performance of individual host cores. Further, now that host platforms are available with up to 24 cores, adding the coprocessor capability to support 6 more ranks is not so interesting, if you can't take advantage of those coprocessor ranks being more powerful than host cores.
If you are talking about future MIC products which improve cluster performance and don't depend on matching coprocessor and host performance, that's a different story, but not one on which we will have any details in the near future.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Harry,
You are right, OpenCL applications can use the all cores available in Intel(R) Xeon Phi(TM) coprocessors. On the Windows host you only see the host cores but MPSS provides APIs / tool which allow you to monitor and use all cores in the coprocessors. Thank you.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I found out that build scripts aren't provided for MPSS 3.1 (I talked to the Yocto guys, who confirm it). Shouldn't build scripts be included in the source release for the GPL components?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@rhn: are you trying to build host components or the entire microprocessor OS?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Belinda: I'm trying to build a portion of the coprocessor OS (namely, the kernel).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@rhn: I believe this forum thread contains what you are looking for (scroll down to the answer that was given today, March 4 2014)
http://software.intel.com/comment/1781756
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for the kernel info.
I'm still interested in more documentation regarding DMA. My troubles with it are two-pronged: the DMA-related registers are undoumented and relation of MIC state changes (booting, resetting) to DMA state is unknown.
So far, I could utilize DMA by inferring register functions from open-source code, but it only works until first device boot.
The registers I found have some relation to DMA and aren't fully described in the System Programmer's Manual are: DCAR_*, DAUX_LO_*, DAUX_HI_*, DMA_DTSTAT_*, DMA_DSTATWB_HI_*, DMA_DSTATWB_LO_*, DCHERR_*, DCHERRMSK_*, DCR, MarkerMessage_Send. They are mostly the same as registers listed in /proc/mic_dma_registers_* and /proc/mic_dma_ring_* (sadly, this debug interface is barely useful without knowing the functions of each register).
The state changes that modify DMA state even without kmod involvement: boot ELF (it stops the DMA engine; what initilization is required on the device side?) and reset (sometimes it resets DMA registers, sometimes not?).
This information would save me (and possibly others) a lot of effort analyzing DMA behaviour.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'd like to see a detailed technical discussion of the vector pipeline and instruction details to help better understand where I'm efficiently using the resources and not. The 2 books only give a basic idea of what's going on and the "Xeon Phi Coprocessor Instruction Set Architecture Reference Manual" doesn't seem to include information like the instruction latency, pairing restrictions, relation of the pipeline and the hardware "threads" etc... Ultimately my goal would be to be able to take a sequence of a dozen or so instructions (mostly straight line intrinsics maybe a little scalar control) and map out how the vector pipeline is being utilized.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Is there an example available that shows how to offload the MKL DSS solver to Intel(R) Xeon Phi(TM) coprocessors? I see that it can be offloaded in several presentations but cannot find an example how to do it.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Arno,
Thanks for your suggestion. We monitor this thread and appreciate your suggestions. We use them in adding to and ranking our collateral priorities.
Regards
--
Taylor
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I would also like to see a document how to make the Phi runable from the Linux Kernel sources on Kernel 3.13 and above
I am having a struggle with this
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Mathias,
Here is a thread where someone ported to Linux Kernel 3.2.14. Does this help?
Regards
--
Taylor
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Taylor,
not really.
I need to use the Kernel implementation which is included in Kernel 3.13 and above.
So far I think that the sole mic_host mic_card modules along with the sample MPSSD and MICCTRL is not enough
the card uos.img has to be ajusted. I amcurrently trying to use the same Kernel version as the host but it does not come up.
It would be nice to have a documentation how to use the Kernel 3.13 and above implementation to use with the MIC.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am told (by people on the MPSS development team) that porting a later kernel to the coprocessor with only SCIF support from the MPSS is easy; porting a kernel with full MPSS support is not. I will look at documenting what is involved.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Frances
I do not need SCIF as I only have one MIC and do not need more communication.
Thanks in advance
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
SCIF is the primary mechanism for all communication with Xeon Phi--including communication between it and your host's CPU. It and the virtual Ethernet driver are the two things one cannot reasonably do without.
The "full MPSS support" to which Frances was referring means things like the power management implementation, which emulates P- and C-states in kernel code so that the Xeon Phi can enter a low-power state when idle and scale its CPU frequency and voltage in response to load.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
i don't know
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi, everybody,
I still have issues getting processor and coprocessors communicating after some major change...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
knox l.
1) Have you looked at the MPSS User's Guide that came with the MPSS 3.5 release? It has been through some major changes. We could still use some additional information on configuring SCIF, COI, IP and InfiniBand and will work on that.
2) Could you start a new forum thread (go to https://software.intel.com/en-us/forums/intel-many-integrated-core and click on the New Topic button) describing, in more detail, what problems you are having and what version of the MPSS you are using? That way you will get more help for your specific problem and we can track it to see if your problem gets resolved.
Frances
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Plz give me a tips on how you can use your assets as collateral, and... is reviewing your business documents, they'll want to see that you're ...http://www.trainingintambaram.in/salesforce-training-in-chennai.html | http://www.trainingintambaram.in/php-training-in-chennai.html
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I want to know whether upcoming Knights Landing(host processor type) will supports Windows7, 8, 8.1 OS(not Server) though it has 72 or more cores
and also want detail news about KNL processors
sorry for poor english(from South Korea.)
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page