解決済み: You're welcome. Glad I could

Liu_N_ · ‎12-09-2014

Hi there~

I've met some problems and have some questions since I'm new to Intel phi.

Our system has 3 mic cards per nodes. Now I'm trying to make these 3 cards and the host CPU work in parallel, just like what MPI does.

I've completed the data distribution for mic and CPU and tried to use "#pragma offload" to start mic processes, like this:

It's quite clear that program blocked here waiting for the completion of the mic process before the next offload.

Is there any non-blocking way to do the offload?

Reply would help a lot! Thanks there.

Kevin_D_Intel · ‎12-09-2014

Sure, you can enable non-blocking offload by adding a unique signal tag to each #pragma offload using the signal() clause and then either one #pragma offload_wait after the final #pragma offload to wait for the completion indication for all unique tags if you wish, or wait on completion indications for the tags individually.

Make sure each signal variable is initialized to a unique value. The brief discussion About Asynchronous Computation and offload_wait and signal ( tag ) in the User guide have more details.

元の投稿で解決策を見る

Kevin_D_Intel · ‎12-09-2014

Sure, you can enable non-blocking offload by adding a unique signal tag to each #pragma offload using the signal() clause and then either one #pragma offload_wait after the final #pragma offload to wait for the completion indication for all unique tags if you wish, or wait on completion indications for the tags individually.

Make sure each signal variable is initialized to a unique value. The brief discussion About Asynchronous Computation and offload_wait and signal ( tag ) in the User guide have more details.

Liu_N_ · ‎12-09-2014

@Davis

I've tried and it works!

Thank you very much~ I still have a lot to learn...

James_C_Intel2 · ‎12-09-2014

Now I'm trying to make these 3 cards and the host CPU work in parallel, just like what MPI does.

I'm sure you know this, but just in case, you do realize that you can use MPI and have MPI processes on each of the Phis and on the host? (Which would be exactly like MPI since it is MPI :-)).

Kevin_D_Intel · ‎12-09-2014

Here are a couple of resources relating to James’ feedback.

How to run Intel MPI on Xeon Phi™
Using MPI and Xeon Phi™ Offload Together

Glad to hear the signals worked. Also, for your sample code you posted, the “-1” target number defers the coprocessor selection to the runtime system; however, for greater coprocessor selection/control you could use a specific program variable and assign a unique target number to each specific offload to execute on.

Liu_N_ · ‎12-09-2014

James Cownie (Intel) wrote:

Now I'm trying to make these 3 cards and the host CPU work in parallel, just like what MPI does.

I'm sure you know this, but just in case, you do realize that you can use MPI and have MPI processes on each of the Phis and on the host? (Which would be exactly like MPI since it is MPI :-)).

Sorry but I've given a wrong picture of the parallelism between the mic and CPU. I know that there could be MPI processes on each Phis when Phis worked in the symmetric mode. I just meant that I need to make CPU and the 3 Phis work in parallel. : )

Liu_N_ · ‎12-09-2014

Kevin Davis (Intel) wrote:

Here are a couple of resources relating to James’ feedback.

How to run Intel MPI on Xeon Phi™
Using MPI and Xeon Phi™ Offload Together

Glad to hear the signals worked. Also, for your sample code you posted, the “-1” target number defers the coprocessor selection to the runtime system; however, for greater coprocessor selection/control you could use a specific program variable and assign a unique target number to each specific offload to execute on.

Thanks~ Because my code is quite simple, setting some different const values for target numbers to the devices can do~

My code works when all MPI ranks are on the host and the computation part is done asynchronously on Phis running in offload mode and CPU, and what left to do is optimization.

Thank you again for your considerate help~

Kevin_D_Intel · ‎12-10-2014

You're welcome. Glad I could help.

Offload problems. Can I do offload in a non-blocking way?