Software Archive
Read-only legacy content
17061 Discussions

Asynchronous Patterns on MIC

Salvadore__Francesco
690 Views

Hello,

I read about the handling of Asynchronous communications and computations using MIC, but the examples I found are all very simple and I would need a clarification about some possible patterns I need to implement in my  code.

1) In the following scenario, are sections a and b serialized on MIC or do these get simultaneously executed on MIC? In case, how are these scheduled?

[cpp]#pragma offload target(mic:0) .... signal(one)

<MIC computing section a>

#pragma offload target(mic:0) .... signal(one)

<MIC computing section b>

#offload_wait wait(one)[/cpp]

2) The same questions as above replacing the name of a signal

[cpp]#pragma offload target(mic:0) .... signal(one)

<MIC computing section a>

#pragma offload target(mic:0) .... signal(two)

<MIC computing section b>

#offload_wait wait(one,two)[/cpp]

3) The same questions as above adding a wait

[cpp]#pragma offload target(mic:0) .... signal(one)

<MIC computing section a>

#pragma offload target(mic:0) .... signal(two) wait(one)

<MIC computing section b>

#offload_wait wait(two)[/cpp]

4) Considering now to add, at the end, a CPU computing section, how can I have this section to be executed simulatenously to MIC computing sections? If possible, I would like to be able to select if MIC section a and MIC section b are serialized or not on MIC and, in any case, to have the CPU computing section executing while MIC processes both the sections a and b

[cpp]#pragma offload target(mic:0) .... signal(one)

<MIC computing section a>

#pragma offload target(mic:0) .... signal(two)

<MIC computing section b>

<CPU computing section>

#offload_wait wait(one,two)[/cpp]

many thanks for any help,

Francesco

0 Kudos
4 Replies
Sumedh_N_Intel
Employee
690 Views

Hi Francesco, 

I am investigating your issue. Let me get back to you with what I find. 

0 Kudos
jimdempseyatthecove
Honored Contributor III
690 Views

Francesco,

While you await an answer from Sumedh, in your sketch code how many threads in the Xeon Phi do you intend to run in offload section a, and offload section b? At issue here is you would want to avoid oversubscription in the Xeon Phi. The preferred way would be to have a programming structure whereby you have one thread pool within the Xeon Phi. Avoid OpenMP nested levels (unless you take care to manage your thread teams properly). Not having a Xeon Phi handy for testing you might want to see if you can run concurrently on the Xeon Phi a "Task Manager"-like app while performing your offload tests. This may give you some visualization aid. Alternatively VTune may give you this information.

Jim Dempsey

0 Kudos
Ravi_N_Intel
Employee
690 Views

Case 1:  Both section a and b are executed simulataneously.  But the signal from section a is clobbered by section b, so the offload_wait waits only for section b to complete

Case 2:  Both section a and b are executed simulataneously, and the offload_wait waits for both to complete

Case 3: Section b waits for section a to complete before proceeding.

Case 4: CPU computation overlaps with section a and b computation, Once the CPU completes it waits at offload_wait for section a and b to complete

0 Kudos
Ravi_N_Intel
Employee
690 Views

In you final case CPU overlaps only with section b.   You could do some work on CPU between section a and section b,  if there is no work to be done then you can merge section a and section b.

0 Kudos
Reply