Solved: running offload code on node without a mic

joshbowden · ‎11-17-2014

Hi again, i'd like to run my #pragma offload code on a node that does not have a mic present. When I try I get the following error:

"offload error: cannot offload to MIC - device is not available"

Is there a flag to tell the software to run the CPU only based version? I thought the binary had both code paths, so it should just choose a sensible one for what is available?

Thanks for your help.

Regards,

Josh

Kevin_D_Intel · ‎11-17-2014

The final executable does contain both host and target code paths but the default offload mode is “mandatory” and the app terminates with an error when no coprocessor is available.

You can change the default behavior for any individual offload construct by adding the optional clause on the offload directive/pragma or for the entire program using the -qoffload=optional command-line option. More details in the User Guide here.

View solution in original post

Kevin_D_Intel · ‎11-17-2014

The final executable does contain both host and target code paths but the default offload mode is “mandatory” and the app terminates with an error when no coprocessor is available.

You can change the default behavior for any individual offload construct by adding the optional clause on the offload directive/pragma or for the entire program using the -qoffload=optional command-line option. More details in the User Guide here.

joshbowden · ‎11-17-2014

Thanks again Kevin.

I'll try to find some time to have another look at the documentation. I'm sure there will be more qustions I want to ask about calling offload code from openmp (CPU) threads, however I'll try to work that out for myself tomorrow.

Cheers,

Josh.

Kevin_D_Intel · ‎11-17-2014

Sounds good.

jimdempseyatthecove · ‎11-17-2014

Kevin,

This may be a little bit off topic, but I think it is related....

Considering that "#pragma offload (presumably run on host) will inject code and/or data into a target (MIC), or potentially non-MIC target (#pragma omp offload now permits this).

What are the prospects of having:

#pragma offload target(SomeOtherSystemOnNetwork)

The above at first glance may be thought of as similar to OpenMPI, but it differs in that this is not a "rank" oriented paradigm. Only specific portions of the code and/or data to/from the specified SomeOtherSystemOnNetwork is transferred, and each offload to specific targets can vary. For example a cluster of nodes (non-SMP), where some of the nodes may have MIC, others may have (ehm... excuse me) Tesla, others GPGPU, and others are large SMP, it would be an attractive feature for an attached workstation to launch an application on the workstation that could partial out specific portions of the application to the most appropriate system... concurrently.

Jim Dempsey

Kevin_D_Intel · ‎11-17-2014

An interesting thought, Jim. The design lends itself to extension to other targets besides Xeon Phi™. We extended it for offload to the Intel® Graphics Technology target; however, to what extent other targets can be incorporated I just don’t know. The target compiler must be capable of producing compatible instructions.

I will inquire with Development and see if they might weigh in on the idea.

Rajiv_D_Intel · ‎11-17-2014

The compiler is required to generate code both for the host and for the target. The supported targets are Xeon Phi (MIC) and GT.

Future generations of MIC may be available as standalone workstations or add-in cards. For a cluster of nodes with Xeon and Xeon Phi in them it will be possible to "offload" from Xeon to MIC over the cluster fabric. In this scenario, a MIC node on the network will appear to the program as an offload-able target. However, the only offload-able targets will be MIC, not non-Intel processors.

jimdempseyatthecove · ‎11-17-2014

>>The compiler is required to generate code both for the host and for the target.

So.... when the target is another host with the same architecture (IA32, Intel64, AMD64, or mixture via compiler options), then there would be no reason (other than marketing) than to not include this in the supported offload targets. I see a great benefit, even when you restrict this to Intel products.

Example:

I have a Windows 7 workstation without MIC. 10 feet away I have a Linux workstation with a Xeon E5-2620v3 processor and two Xeon Phi coprocessors.

It would be nice if I could

a) run an application on my workstation, that has offloads to the remote MIC (this can be done, though I do not do this - no Infiniband here)
b) run an application on my workstation, that has offloads to the remote Xeon E5-2620v3 processor (via Gigabit Ethernet)
c) run an application on my workstation, that has offloads to the remote Xeon E5-2620v3 processor and which itself offloads to its connected MICs
...
xyz) run an application on my workstation, that has offloads to someplace in the cloud (example being your Many Cores Testing Lab)

The point of the offload is to provide a homogeneous experience in a heterogeneous environment.

Jim Dempsey

joshbowden · ‎11-17-2014

While on this slightly off topic - There is a project named "VirtualCL" that virtulaizes a network of OpenCL devices - as Phi's and CPUs can run OpenCL code this may work for you. It does not help your #pragma omp based codes much yet though.

Cheers, Josh