Hello,

fenglai · ‎05-16-2017

Hello!

I have a scientific program in which I used TBB for multi-threading parallelization.
Now I am going to add support for using Intel phi co-processor. We have multiple Intel
phi coprocessors, I am wondering that in general how to set up an one to one correspondence
between the TBB threads and the Phi card.

For example, the pseudo code is like:

set nThreads to be number of Phi cards(suggest it's N);

TBB initialization with nThreads;

loop over the batch of job with TBB thread(using parallel for):

for each thread(range from 0 to N-1) we find the corresponding Phi card(range from 0 to N-1);

initialize the input/output data for the given Phi card(using offload);

assign the work to Phi card;

collect results from the Phi card(using offload);

merge the result to global one with tbb:mutex;

end loop

However, it seems TBB working thread does not have an ID where user can obtain(is that correct??).
So in general, how can I realize such implementation so that it's able to use multiple co-processors
for a TBB threading program?

By the way, the work on the Phi card I am using OpenMP.

Thank you so much!!

Alexei_K_Intel · ‎05-17-2017

Hello,

Is your idea is to split work between the cards and not to use host for any computations? Or are you trying to implement some dynamic balancing scheme between host and cards?

Regards,
Alex

fenglai · ‎05-17-2017

Hello Alex,

The code I mentioned in the original post is to perform numerical calculation over three dimensional grid in the space. The work assigned to Phi card is used to perform a major part of work, probably around 90% of cost; and the host does calculation before and after the Phi card(The result collected from Phi card will be further processed so that to form the final result). Originally I used TBB for maintaining dynamic load balance between the batches of job, and it did the job perfectly. So the suggested implementation on Phi card is based on our previous implementation.

For making the problem clear I omit the irrelevant details in the implementation regarding to the problem I have, in fact the calculation on the host CPU threads are important, too.

Hope that clears your question, and thanks for your help!!

phoenix

Katranov, Alexei (Intel) wrote:

Hello,

Is your idea is to split work between the cards and not to use host for any computations? Or are you trying to implement some dynamic balancing scheme between host and cards?

Regards,
Alex

Alexei_K_Intel · ‎05-17-2017

Thank you for the explanation. I suggest creating dedicated threads to work with Intel Xeon Phi co-processors because Intel TBB documentation recommends not to block Intel TBB worker threads for non-computational work or for waiting. (The threads that calls offload API do not participate in calculations, they just do synchronization with the co-processors.) The dedicated threads can cause oversubscription on the host but it should not lead to any issues because they do not perform extensive calculations.

I hope it will help. If you have any questions feel free to ask.

Regards,
Alex

fenglai · ‎05-24-2017

Thank you Alex for your suggestion. I will try openMP or the raw threads from boost library.

Thank you!

Alexei_K_Intel · ‎05-24-2017

As for raw threads, you may want to consider std::thread (or if you cannot use C++11, consider C++11 features implemented by Intel TBB library, e.g. include tbb/compat/thread).

Regards,
Alex

building mapping between TBB threads with multiple Intel phi coprocessors