Asynchronous Patterns on MIC

Salvadore__Francesco · ‎10-08-2013

Hello,

I read about the handling of Asynchronous communications and computations using MIC, but the examples I found are all very simple and I would need a clarification about some possible patterns I need to implement in my code.

1) In the following scenario, are sections a and b serialized on MIC or do these get simultaneously executed on MIC? In case, how are these scheduled?

[cpp]#pragma offload target(mic:0) .... signal(one)

#pragma offload target(mic:0) .... signal(one)

#offload_wait wait(one)[/cpp]

2) The same questions as above replacing the name of a signal

[cpp]#pragma offload target(mic:0) .... signal(one)

#pragma offload target(mic:0) .... signal(two)

#offload_wait wait(one,two)[/cpp]

3) The same questions as above adding a wait

[cpp]#pragma offload target(mic:0) .... signal(one)

#pragma offload target(mic:0) .... signal(two) wait(one)

#offload_wait wait(two)[/cpp]

4) Considering now to add, at the end, a CPU computing section, how can I have this section to be executed simulatenously to MIC computing sections? If possible, I would like to be able to select if MIC section a and MIC section b are serialized or not on MIC and, in any case, to have the CPU computing section executing while MIC processes both the sections a and b

[cpp]#pragma offload target(mic:0) .... signal(one)

#pragma offload target(mic:0) .... signal(two)

#offload_wait wait(one,two)[/cpp]

many thanks for any help,

Francesco

Sumedh_N_Intel · ‎10-08-2013

Hi Francesco,

I am investigating your issue. Let me get back to you with what I find.

jimdempseyatthecove · ‎10-08-2013

Francesco,

While you await an answer from Sumedh, in your sketch code how many threads in the Xeon Phi do you intend to run in offload section a, and offload section b? At issue here is you would want to avoid oversubscription in the Xeon Phi. The preferred way would be to have a programming structure whereby you have one thread pool within the Xeon Phi. Avoid OpenMP nested levels (unless you take care to manage your thread teams properly). Not having a Xeon Phi handy for testing you might want to see if you can run concurrently on the Xeon Phi a "Task Manager"-like app while performing your offload tests. This may give you some visualization aid. Alternatively VTune may give you this information.

Jim Dempsey

Ravi_N_Intel · ‎10-08-2013

Case 1: Both section a and b are executed simulataneously. But the signal from section a is clobbered by section b, so the offload_wait waits only for section b to complete

Case 2: Both section a and b are executed simulataneously, and the offload_wait waits for both to complete

Case 3: Section b waits for section a to complete before proceeding.

Case 4: CPU computation overlaps with section a and b computation, Once the CPU completes it waits at offload_wait for section a and b to complete

Ravi_N_Intel · ‎10-08-2013

In you final case CPU overlaps only with section b. You could do some work on CPU between section a and section b, if there is no work to be done then you can merge section a and section b.