What collateral/documentation do you want to see? - Page 3

BelindaLiviero · ‎02-07-2013

Do you have questions that you are not finding the answers for in our documentation? Need more training, source code examples, on what specifically? Help us understand what's missing so that we can make sure we develop documentation you care about (what is important, and what is nice to have)! Thank you

TimP · ‎01-19-2014

Joe wrote:

Hi, I would find it useful to see some examples that combine MPI processes with openMP (or similar) thread generation (and in Fortran). On HPC clusters I've iused in the past, I just matched # of cores to # of processes, but it appears the Phi benifits from running fewer processes with many threads. Perhaps some of these examples are out there already, as I've only just started looking.

Thanks!

Depending on what you have in mind, this may be an interesting topic, but one on which there isn't much interest in attempting to make committee decisions about documentation.

I'm not sure how examples would help you on this. Assuming that your application scales reasonably well both under OpenMP and under MPI, it's mainly a question of trying the combinations. As you're likely to be running in what is somewhat illogically termed "symmetric" mode ranks on both host and coprocessor), you need to balance the work so that MPI barriers are reached about the same time by all ranks.

As far as the MIC side of this is concerned, you are limited by the increased memory consumption of additional MPI ranks, and the way in which sharing of VPU resources is done by alternating among threads on each core, making it unlikely that you want want more than 1 rank on a core. Useful applications, unlike simple examples, will run out of RAM, depending on your coprocessor model, with a lot fewer than 1 rank per core. On the other hand, real OpenMP applications, if there is any use of private arrays, will become starved for stack even with stack set to "unlimited," which some Intel experts advise against doing, so the number of useful threads per rank is limited. Of course, simple OpenMP examples will scale to at least 116 threads with the normal "unlimited" stack.

I have worked with applications which showed a pronounced performance peak (running on MIC alone) at 6 ranks of 30 threads (setting KMP_AFFINITY=balanced so as to spread the threads evenly across the cores assigned to each rank). If your application runs best on host with 1 rank per core, this may pose a problem in that you already have an excessive number of cores passing messages between host and coprocessor even before you engage multiple nodes, and, for MIC to be useful, the performance of a rank with 30 threads ought to exceed performance of individual host cores. Further, now that host platforms are available with up to 24 cores, adding the coprocessor capability to support 6 more ranks is not so interesting, if you can't take advantage of those coprocessor ranks being more powerful than host cores.

If you are talking about future MIC products which improve cluster performance and don't depend on matching coprocessor and host performance, that's a different story, but not one on which we will have any details in the near future.

Loc_N_Intel · ‎01-21-2014

Hi Harry,

You are right, OpenCL applications can use the all cores available in Intel(R) Xeon Phi(TM) coprocessors. On the Windows host you only see the host cores but MPSS provides APIs / tool which allow you to monitor and use all cores in the coprocessors. Thank you.

rhn · ‎03-02-2014

I found out that build scripts aren't provided for MPSS 3.1 (I talked to the Yocto guys, who confirm it). Shouldn't build scripts be included in the source release for the GPL components?

BelindaLiviero · ‎03-03-2014

@rhn: are you trying to build host components or the entire microprocessor OS?

rhn · ‎03-03-2014

@Belinda: I'm trying to build a portion of the coprocessor OS (namely, the kernel).

BelindaLiviero · ‎03-04-2014

@rhn: I believe this forum thread contains what you are looking for (scroll down to the answer that was given today, March 4 2014)

http://software.intel.com/comment/1781756

rhn · ‎03-13-2014

Thanks for the kernel info.

I'm still interested in more documentation regarding DMA. My troubles with it are two-pronged: the DMA-related registers are undoumented and relation of MIC state changes (booting, resetting) to DMA state is unknown.

So far, I could utilize DMA by inferring register functions from open-source code, but it only works until first device boot.

The registers I found have some relation to DMA and aren't fully described in the System Programmer's Manual are: DCAR_*, DAUX_LO_*, DAUX_HI_*, DMA_DTSTAT_*, DMA_DSTATWB_HI_*, DMA_DSTATWB_LO_*, DCHERR_*, DCHERRMSK_*, DCR, MarkerMessage_Send. They are mostly the same as registers listed in /proc/mic_dma_registers_* and /proc/mic_dma_ring_* (sadly, this debug interface is barely useful without knowing the functions of each register).

The state changes that modify DMA state even without kmod involvement: boot ELF (it stops the DMA engine; what initilization is required on the device side?) and reset (sometimes it resets DMA registers, sometimes not?).

This information would save me (and possibly others) a lot of effort analyzing DMA behaviour.

CFR · ‎03-18-2014

I'd like to see a detailed technical discussion of the vector pipeline and instruction details to help better understand where I'm efficiently using the resources and not. The 2 books only give a basic idea of what's going on and the "Xeon Phi Coprocessor Instruction Set Architecture Reference Manual" doesn't seem to include information like the instruction latency, pairing restrictions, relation of the pipeline and the hardware "threads" etc... Ultimately my goal would be to be able to take a sequence of a dozen or so instructions (mostly straight line intrinsics maybe a little scalar control) and map out how the vector pipeline is being utilized.

Arno_W_ · ‎09-04-2014

Is there an example available that shows how to offload the MKL DSS solver to Intel(R) Xeon Phi(TM) coprocessors? I see that it can be offloaded in several presentations but cannot find an example how to do it.

TaylorIoTKidd · ‎10-22-2014

Arno,

Thanks for your suggestion. We monitor this thread and appreciate your suggestions. We use them in adding to and ranking our collateral priorities.

Regards
--
Taylor

Mathias_R_ · ‎11-11-2014

I would also like to see a document how to make the Phi runable from the Linux Kernel sources on Kernel 3.13 and above

I am having a struggle with this

TaylorIoTKidd · ‎11-12-2014

Mathias,

Here is a thread where someone ported to Linux Kernel 3.2.14. Does this help?

Regards
--
Taylor

Mathias_R_ · ‎11-15-2014

Hi Taylor,

not really.

I need to use the Kernel implementation which is included in Kernel 3.13 and above.

So far I think that the sole mic_host mic_card modules along with the sample MPSSD and MICCTRL is not enough
the card uos.img has to be ajusted. I amcurrently trying to use the same Kernel version as the host but it does not come up.

It would be nice to have a documentation how to use the Kernel 3.13 and above implementation to use with the MIC.

Frances_R_Intel · ‎11-17-2014

I am told (by people on the MPSS development team) that porting a later kernel to the coprocessor with only SCIF support from the MPSS is easy; porting a kernel with full MPSS support is not. I will look at documenting what is involved.

Mathias_R_ · ‎11-21-2014

Hi Frances

I do not need SCIF as I only have one MIC and do not need more communication.

Thanks in advance

Evan_P_Intel · ‎11-21-2014

SCIF is the primary mechanism for all communication with Xeon Phi--including communication between it and your host's CPU. It and the virtual Ethernet driver are the two things one cannot reasonably do without.

The "full MPSS support" to which Frances was referring means things like the power management implementation, which emulates P- and C-states in kernel code so that the Xeon Phi can enter a low-power state when idle and scale its CPU frequency and voltage in response to load.

kecoro · ‎04-27-2015

i don't know

knox_l_ · ‎05-29-2015

Hi, everybody,

I still have issues getting processor and coprocessors communicating after some major change...

Frances_R_Intel · ‎05-29-2015

knox l.

1) Have you looked at the MPSS User's Guide that came with the MPSS 3.5 release? It has been through some major changes. We could still use some additional information on configuring SCIF, COI, IP and InfiniBand and will work on that.

2) Could you start a new forum thread (go to https://software.intel.com/en-us/forums/intel-many-integrated-core and click on the New Topic button) describing, in more detail, what problems you are having and what version of the MPSS you are using? That way you will get more help for your specific problem and we can track it to see if your problem gets resolved.

Frances

agnes_m_ · ‎09-13-2015

Plz give me a tips on how you can use your assets as collateral, and... is reviewing your business documents, they'll want to see that you're ...http://www.trainingintambaram.in/salesforce-training-in-chennai.html | http://www.trainingintambaram.in/php-training-in-chennai.html

Y_K_ · ‎09-22-2015

I want to know whether upcoming Knights Landing(host processor type) will supports Windows7, 8, 8.1 OS(not Server) though it has 72 or more cores

and also want detail news about KNL processors

sorry for poor english(from South Korea.)