Software Archive
Read-only legacy content
17061 Discussions

running offload code on node without a mic

joshbowden
Beginner
1,094 Views

Hi again, i'd like to run my #pragma offload code on a node that does not have a mic present. When I try I get the following error:

"offload error: cannot offload to MIC - device is not available"

Is there a flag to tell the software to run the CPU only based version? I thought the binary had both code paths, so it should just choose a sensible one for what is available?

Thanks for your help.

Regards,

Josh

 

0 Kudos
1 Solution
Kevin_D_Intel
Employee
1,094 Views

The final executable does contain both host and target code paths but the default offload mode is “mandatory” and the app terminates with an error when no coprocessor is available.

You can change the default behavior for any individual offload construct by adding the optional clause on the offload directive/pragma or for the entire program using the -qoffload=optional command-line option. More details in the User Guide here.

View solution in original post

0 Kudos
8 Replies
Kevin_D_Intel
Employee
1,095 Views

The final executable does contain both host and target code paths but the default offload mode is “mandatory” and the app terminates with an error when no coprocessor is available.

You can change the default behavior for any individual offload construct by adding the optional clause on the offload directive/pragma or for the entire program using the -qoffload=optional command-line option. More details in the User Guide here.

0 Kudos
joshbowden
Beginner
1,094 Views

Thanks again Kevin.

I'll try to find some time to have another look at the documentation. I'm sure there will be more qustions I want to ask about calling offload code from openmp (CPU) threads, however I'll try to work that out for myself tomorrow.

Cheers,

Josh.

0 Kudos
Kevin_D_Intel
Employee
1,094 Views

Sounds good.

0 Kudos
jimdempseyatthecove
Honored Contributor III
1,094 Views

Kevin,

This may be a little bit off topic, but I think it is related....

Considering that "#pragma offload (presumably run on host) will inject code and/or data into a target (MIC), or potentially non-MIC target (#pragma omp offload now permits this).

What are the prospects of having:

#pragma offload target(SomeOtherSystemOnNetwork)

The above at first glance may be thought of as similar to OpenMPI, but it differs in that this is not a "rank" oriented paradigm. Only specific portions of the code and/or data to/from the specified SomeOtherSystemOnNetwork is transferred, and each offload to specific targets can vary. For example a cluster of nodes (non-SMP), where some of the nodes may have MIC, others may have (ehm... excuse me) Tesla, others GPGPU, and others are large SMP, it would be an attractive feature for an attached workstation to launch an application on the workstation that could partial out specific portions of the application to the most appropriate system... concurrently.

Jim Dempsey

0 Kudos
Kevin_D_Intel
Employee
1,094 Views

An interesting thought, Jim.  The design lends itself to extension to other targets besides Xeon Phi™. We extended it for offload to the Intel® Graphics Technology target; however, to what extent other targets can be incorporated I just don’t know. The target compiler must be capable of producing compatible instructions.

I will inquire with Development and see if they might weigh in on the idea.

0 Kudos
Rajiv_D_Intel
Employee
1,094 Views

The compiler is required to generate code both for the host and for the target. The supported targets are Xeon Phi (MIC) and GT.

Future generations of MIC may be available as standalone workstations or add-in cards. For a cluster of nodes with Xeon and Xeon Phi in them it will be possible to "offload" from Xeon to MIC over the cluster fabric. In this scenario, a MIC node on the network will appear to the program as an offload-able target. However, the only offload-able targets will be MIC, not non-Intel processors.

0 Kudos
jimdempseyatthecove
Honored Contributor III
1,094 Views

>>The compiler is required to generate code both for the host and for the target.

So.... when the target is another host with the same architecture (IA32, Intel64, AMD64, or mixture via compiler options), then there would be no reason (other than marketing) than to not include this in the supported offload targets. I see a great benefit, even when you restrict this to Intel products.

Example:

I have a Windows 7 workstation without MIC. 10 feet away I have a Linux workstation with a Xeon E5-2620v3 processor and two Xeon Phi coprocessors.

It would be nice if I could

a) run an application on my workstation, that has offloads to the remote MIC (this can be done, though I do not do this - no Infiniband here)
b) run an application on my workstation, that has offloads to the remote Xeon E5-2620v3 processor (via Gigabit Ethernet)
c) run an application on my workstation, that has offloads to the remote Xeon E5-2620v3 processor and which itself offloads to its connected MICs
...
xyz) run an application on my workstation, that has offloads to someplace in the cloud (example being your Many Cores Testing Lab)

The point of the offload is to provide a homogeneous experience in a heterogeneous environment.

Jim Dempsey

 

0 Kudos
joshbowden
Beginner
1,094 Views

While on this slightly off topic - There is a project named "VirtualCL" that virtulaizes a network of OpenCL devices - as Phi's and CPUs can run OpenCL code this may work for you. It does not help your #pragma omp based codes much yet though.

Cheers, Josh

0 Kudos
Reply