As Sunny said, the best

Aaron_S_ · ‎03-17-2016

I recently acquired a system with dual xeon phi cards. How best to put both cards to work? At the moment, I can only afford the C++ Composer software -- so MPI isn't an option.

Sunny_G_Intel · ‎03-17-2016

Hi Aaron,

Please refer to the following Intel Xeon Phi system administration guide to get started.

https://software.intel.com/sites/default/files/managed/bd/53/System_Administration_Guide_Intel%28R%29XeonPhi%28TM%29Coprocessor.pdf

Intel MPSS user guide is a great reference too.

In order to assist you better can you please explain for what purpose you intend to use the coprocessors.

Thanks,

Andrey_Vladimirov · ‎04-06-2016

As Sunny said, the best approach depends on the purpose for which you intend to use the coprocessors, specifically, the pattern of parallelism and communication.

If you can run two independent Linux processes to do your workload (batch processing scenario), you can use the native programming model and start jobs on coprocessors in parallel using "ssh" like you would on two independent general-purpose machines.
If you have one process that has multiple independent work-items to compute ("embarrassingly parallel" code, no communication), you can use the offload model to send independent work-items to different coprocessors. Start two threads on the host and map each thread to the respective coprocessor:

const int nDevices = _Offload_number_of_devices(); 
#pragma omp parallel num_threads(nDevices)
  {
    const int i = omp_get_thread_num();
#pragma offload target(mic: i)
      {
        MyFunction(/*...*/ );
      }
 }

In the same way you can distribute a set of work-items between coprocessors:

const int nDevices = _Offload_number_of_devices(); 
#pragma omp parallel num_threads(nDevices)
{
  const int iDevice = omp_get_thread_num();
#pragma omp for schedule(dynamic, 1)
  for (int i = 0; i < nWorkItems; i++) {
#pragma offload target(mic: iDevice)
    {
      MyFunction(i);
    } 
  }
}

If you need communication between coprocessors, this is more complex. You can indirectly communicate between coprocessors by passing messages to/from host, but this would require synchronization at communication. This is where MPI would be a good tool.

We have a comprehensive free Web-based training coming soon where you can learn more: http://colfaxresearch.com/how-series/

Loc_N_Intel · ‎04-08-2016

Hi Aaron,

In addition to the above techniques, you can also use the Intel(R) hStreams library. This approach offers an abstraction that controls the compute capabilities of a heterogeneous system. The Intel(R) hStream library can be used on Intel Xeon processors and Intel(R) Xeon Phi(TM) coprocessors.

You can download hStreams binaries from:

• https://01.org/sites/default/files/downloads/hetero-streams-library/hstreams-1.0.0.tar (Linux)

• https://01.org/sites/default/files/downloads/hetero-streams-library/hstreams-1.0.0.zip (Windows)

How to put two Xeon Phis to work?