Software Archive
Read-only legacy content
17061 ディスカッション

Offload problems. Can I do offload in a non-blocking way?

Liu_N_
ビギナー
1,009件の閲覧回数

Hi there~

I've met some problems and have some questions since I'm new to Intel phi.

Our system has 3 mic cards per nodes. Now I'm trying to make these 3 cards and the host CPU work in parallel, just like what MPI does.

I've completed the data distribution for mic and CPU and tried to use "#pragma offload" to start mic processes, like this:

捕获.PNG

 

It's quite clear that program blocked here waiting for the completion of the mic process  before the next offload.

Is there any non-blocking way to do the offload? 

Reply would help a lot!  Thanks there.

0 件の賞賛
1 解決策
Kevin_D_Intel
従業員
1,009件の閲覧回数

Sure, you can enable non-blocking offload by adding a unique signal tag to each #pragma offload using the signal() clause and then either one #pragma offload_wait after the final #pragma offload to wait for the completion indication for all unique tags if you wish, or wait on completion indications for the tags individually.

Make sure each signal variable is initialized to a unique value. The brief discussion About Asynchronous Computation and offload_wait and signal ( tag ) in the User guide have more details.

元の投稿で解決策を見る

7 返答(返信)
Kevin_D_Intel
従業員
1,010件の閲覧回数

Sure, you can enable non-blocking offload by adding a unique signal tag to each #pragma offload using the signal() clause and then either one #pragma offload_wait after the final #pragma offload to wait for the completion indication for all unique tags if you wish, or wait on completion indications for the tags individually.

Make sure each signal variable is initialized to a unique value. The brief discussion About Asynchronous Computation and offload_wait and signal ( tag ) in the User guide have more details.

Liu_N_
ビギナー
1,009件の閲覧回数

@Davis

I've tried and it works!

Thank you very much~ I still have a lot to learn...

James_C_Intel2
従業員
1,009件の閲覧回数

Now I'm trying to make these 3 cards and the host CPU work in parallel, just like what MPI does.

I'm sure you know this, but just in case, you do realize that you can use MPI and have MPI processes on each of the Phis and on the host? (Which would be  exactly like MPI since it is MPI :-)).

Kevin_D_Intel
従業員
1,009件の閲覧回数

Here are a couple of resources relating to James’ feedback.

How to run Intel MPI on Xeon Phi™
Using MPI and Xeon Phi™ Offload Together

Glad to hear the signals worked. Also, for your sample code you posted, the “-1” target number defers the coprocessor selection to the runtime system; however, for greater coprocessor selection/control you could use a specific program variable and assign a unique target number to each specific offload to execute on.

Liu_N_
ビギナー
1,009件の閲覧回数

James Cownie (Intel) wrote:

Now I'm trying to make these 3 cards and the host CPU work in parallel, just like what MPI does.

I'm sure you know this, but just in case, you do realize that you can use MPI and have MPI processes on each of the Phis and on the host? (Which would be  exactly like MPI since it is MPI :-)).

Sorry but I've given a wrong picture of the parallelism between the mic and CPU. I know that there could be MPI processes on each Phis when Phis worked in the symmetric mode. I just meant that I need to make CPU and the 3 Phis work in parallel. : )

Liu_N_
ビギナー
1,009件の閲覧回数

Kevin Davis (Intel) wrote:

Here are a couple of resources relating to James’ feedback.

How to run Intel MPI on Xeon Phi™
Using MPI and Xeon Phi™ Offload Together

Glad to hear the signals worked. Also, for your sample code you posted, the “-1” target number defers the coprocessor selection to the runtime system; however, for greater coprocessor selection/control you could use a specific program variable and assign a unique target number to each specific offload to execute on.

Thanks~ Because my code is quite simple, setting some different const values for target numbers to the devices can do~

My code works when all MPI ranks are on the host and the computation part is done asynchronously on Phis running in offload mode and CPU, and what left to do is optimization. 

Thank you again for your considerate help~

Kevin_D_Intel
従業員
1,009件の閲覧回数

You're welcome. Glad I could help.
 

返信