Software Archive
Read-only legacy content
17060 Discussions

What collateral/documentation do you want to see?

BelindaLiviero
Employee
5,929 Views

Do you have questions that you are not finding the answers for in our documentation?  Need more training, source code examples, on what specifically?   Help us understand what's missing so that we can make sure we develop documentation you care about (what is important, and what is nice to have)!   Thank you

0 Kudos
75 Replies
TimP
Honored Contributor III
1,824 Views

Joe wrote:

Hi, I would find it useful to see some examples that combine MPI processes with openMP (or similar) thread generation (and in Fortran).  On HPC clusters I've iused in the past, I just matched # of cores to # of processes, but it appears the Phi benifits from running fewer processes with many threads.  Perhaps some of these examples are out there already, as I've only just started looking.

Thanks!

Depending on what you have in mind, this may be an interesting topic, but one on which there isn't much interest in attempting to make committee decisions about documentation.

I'm not sure how examples would help you on this.  Assuming that your application scales reasonably well both under OpenMP and under MPI, it's mainly a question of trying the combinations.  As you're likely to be running in what is somewhat illogically termed "symmetric" mode ranks on both host and coprocessor), you need to balance the work so that MPI barriers are reached about the same time by all ranks.

As far as the MIC side of this is concerned, you are limited by the increased memory consumption of additional MPI ranks, and the way in which sharing of VPU resources is done by alternating among threads on each core, making it unlikely that you want want more than 1 rank on a core.  Useful applications, unlike simple examples, will run out of RAM, depending on your coprocessor model, with a lot fewer than 1 rank per core.  On the other hand, real OpenMP applications, if there is any use of private arrays, will become starved for stack even with stack set to "unlimited," which some Intel experts advise against doing, so the number of useful threads per rank is limited.  Of course, simple OpenMP examples will scale to at least 116 threads with the normal "unlimited" stack.

I have worked with applications which showed a pronounced performance peak (running on MIC alone) at 6 ranks of 30 threads (setting KMP_AFFINITY=balanced so as to spread the threads evenly across the cores assigned to each rank).  If your application runs best on host with 1 rank per core, this may pose a problem in that you already have an excessive number of cores passing messages between host and coprocessor even before you engage multiple nodes, and, for MIC to be useful, the performance of a rank with 30 threads ought to exceed performance of individual host cores.  Further, now that host platforms are available with up to 24 cores, adding the coprocessor capability to support 6 more ranks is not so interesting, if you can't take advantage of those coprocessor ranks being more powerful than host cores.

If you are talking about future MIC products which improve cluster performance and don't depend on matching coprocessor and host performance, that's a different story, but not one on which we will have any details in the near future.

 

0 Kudos
Loc_N_Intel
Employee
1,824 Views

Hi Harry,

You are right, OpenCL applications can use the all cores available in Intel(R) Xeon Phi(TM) coprocessors. On the Windows host you only see the host cores but MPSS provides  APIs / tool which allow you to monitor and use all cores in the coprocessors. Thank you.

0 Kudos
rhn
Beginner
1,824 Views

I found out that build scripts aren't provided for MPSS 3.1 (I talked to the Yocto guys, who confirm it). Shouldn't build scripts be included in the source release for the GPL components?

0 Kudos
BelindaLiviero
Employee
1,824 Views

@rhn:  are you trying to build host components or the entire microprocessor OS?

 

0 Kudos
rhn
Beginner
1,824 Views

@Belinda: I'm trying to build a portion of the coprocessor OS (namely, the kernel).

0 Kudos
BelindaLiviero
Employee
1,824 Views

@rhn:  I believe this forum thread contains what you are looking for (scroll down to the answer that was given today, March 4 2014)

http://software.intel.com/comment/1781756

 

0 Kudos
rhn
Beginner
1,824 Views

Thanks for the kernel info.

I'm still interested in more documentation regarding DMA. My troubles with it are two-pronged: the DMA-related registers are undoumented and relation of MIC state changes (booting, resetting) to DMA state is unknown.

So far, I could utilize DMA by inferring register functions from open-source code, but it only works until first device boot.

The registers I found have some relation to DMA and aren't fully described in the System Programmer's Manual are: DCAR_*, DAUX_LO_*, DAUX_HI_*, DMA_DTSTAT_*, DMA_DSTATWB_HI_*, DMA_DSTATWB_LO_*, DCHERR_*, DCHERRMSK_*, DCR, MarkerMessage_Send. They are mostly the same as registers listed in /proc/mic_dma_registers_* and /proc/mic_dma_ring_* (sadly, this debug interface is barely useful without knowing the functions of each register).

The state changes that modify DMA state even without kmod involvement: boot ELF (it stops the DMA engine; what initilization is required on the device side?) and reset (sometimes it resets DMA registers, sometimes not?).

This information would save me (and possibly others) a lot of effort analyzing DMA behaviour.

0 Kudos
CFR
New Contributor II
1,824 Views

I'd like to see a detailed technical discussion of the vector pipeline and instruction details to help better understand where I'm efficiently using the resources and not.  The 2 books only give a basic idea of what's going on and the "Xeon Phi Coprocessor Instruction Set Architecture Reference Manual" doesn't seem to include information like the instruction latency, pairing restrictions, relation of the pipeline and the hardware "threads" etc...  Ultimately my goal would be to be able to take a sequence of a dozen or so instructions (mostly straight line intrinsics maybe a little scalar control) and map out how the vector pipeline is being utilized.

0 Kudos
Arno_W_
Beginner
1,824 Views

Is there an example available that shows how to offload the MKL DSS solver to Intel(R) Xeon Phi(TM) coprocessors? I see that it can be offloaded in several presentations but cannot find an example how to do it.

0 Kudos
TaylorIoTKidd
New Contributor I
1,824 Views

Arno,

Thanks for your suggestion. We monitor this thread and appreciate your suggestions. We use them in adding to and ranking our collateral priorities.

Regards
--
Taylor
 

0 Kudos
Mathias_R_
Beginner
1,824 Views

I would also like to see a document how to make the Phi runable from the Linux Kernel sources on Kernel 3.13 and above

I am having a struggle with this

0 Kudos
TaylorIoTKidd
New Contributor I
1,824 Views

Mathias,

Here is a thread where someone ported to Linux Kernel 3.2.14. Does this help?

Regards
--
Taylor
 

0 Kudos
Mathias_R_
Beginner
1,824 Views

Hi Taylor,

not really.

I need to use the Kernel implementation which is included in Kernel 3.13 and above.

So far I think that the sole mic_host mic_card modules along with the sample MPSSD and MICCTRL is not enough
the card uos.img has to be ajusted. I amcurrently trying to use the same Kernel version as the host but it does not come up.

It would be nice to have a documentation how to use the Kernel 3.13 and above implementation to use with the MIC.

0 Kudos
Frances_R_Intel
Employee
1,824 Views

I am told (by people on the MPSS development team) that porting a later kernel to the coprocessor with only SCIF support from the MPSS is easy; porting a kernel with full MPSS support is not. I will look at documenting what is involved.

0 Kudos
Mathias_R_
Beginner
1,824 Views

Hi Frances

I do not need SCIF as I only have one MIC and do not need more communication.

Thanks in advance

0 Kudos
Evan_P_Intel
Employee
1,824 Views

SCIF is the primary mechanism for all communication with Xeon Phi--including communication between it and your host's CPU. It and the virtual Ethernet driver are the two things one cannot reasonably do without.

The "full MPSS support" to which Frances was referring means things like the power management implementation, which emulates P- and C-states in kernel code so that the Xeon Phi can enter a low-power state when idle and scale its CPU frequency and voltage in response to load.

0 Kudos
kecoro
Beginner
1,822 Views

i don't know

0 Kudos
knox_l_
Beginner
1,822 Views

Hi, everybody,

I still have issues getting processor and coprocessors communicating after some major change...

0 Kudos
Frances_R_Intel
Employee
1,829 Views

knox l.

1) Have you looked at the MPSS User's Guide that came with the MPSS 3.5 release? It has been through some major changes. We could still use some additional information on configuring SCIF, COI, IP and InfiniBand and will work on that.

2) Could you start a new forum thread (go to https://software.intel.com/en-us/forums/intel-many-integrated-core and click on the New Topic button) describing, in more detail, what problems you are having and what version of the MPSS you are using? That way you will get more help for your specific problem and we can track it to see if your problem gets resolved.

Frances

0 Kudos
agnes_m_
Beginner
1,829 Views

Plz give me a tips on how you can use your assets as collateral, and... is reviewing your business documents, they'll want to see that you're  ...http://www.trainingintambaram.in/salesforce-training-in-chennai.html  | http://www.trainingintambaram.in/php-training-in-chennai.html

0 Kudos
Y_K_
Beginner
1,829 Views

I want to know whether upcoming Knights Landing(host processor type) will supports Windows7, 8, 8.1 OS(not Server) though it has 72 or more cores

and also want detail news about KNL processors

sorry for poor english(from South Korea.)

0 Kudos
Reply