Thank you all for your

Andrey_Vladimirov · ‎12-10-2014

Is there no way to use OpenMP teams on the CPU and on Xeon Phi in the native mode? See code and output below for my attempt to do that.

#include <omp.h>
#include <cstdio>

int main() {
#pragma omp teams
#pragma omp distribute
  for (int i = 0; i < 1000; i++) {
    printf("i=%d team:%d\n", i, omp_get_team_num());
  }
}

[user@c001-n002 ~]$ icpc -qopenmp test.cpp
test.cpp(5): error: teams must be the first and only construct within a target construct
  #pragma omp teams
  ^

compilation aborted for test.cpp (code 2)
[user@c001-n002 ~]$ 
[user@c001-n002 ~]$ icpc -qopenmp test.cpp -mmic
test.cpp(5): warning #3180: unrecognized OpenMP #pragma
  #pragma omp teams
          ^

test.cpp(6): warning #3180: unrecognized OpenMP #pragma
  #pragma omp distribute
          ^

[user@c001-n002 ~]$ icc -V
Intel(R) C Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 15.0.0.090 Build 20140723
Copyright (C) 1985-2014 Intel Corporation.  All rights reserved.

Kevin_D_Intel · ‎12-10-2014

The example appears to be lacking a target construct. The 15.0 compiler also supports combined constructs so either of these work:

#pragma omp target
#pragma omp teams
#pragma omp distribute

Or

#pragma omp target teams distribute

The OpenMP 4.0 support is not available for use with (-mmic) native Xeon Phi™. A few more feature details are listed in the OpenMP* 4.0 Features in Intel Compiler 15.0.

Andrey_Vladimirov · ‎12-10-2014

Is there a plan to make OpenMP teams available for native Xeon Phi applications and for the CPU? I noticed, for example, that OMP_PLACES (which, I believe, is a feature specific to thread teams) supports places of type "sockets". This is telling me that Intel was planning to make teams and places available for CPU applications.

pbkenned1 · ‎12-10-2014

Not sure what the plan is for native Phi applications, but you can offload team constructs to Phi right now with 15.0

For example, this standard OMP example (with a minor modification to call my get_offload_dev()):

#pragma omp declare target
extern void get_offload_dev();
float dotprod(float B[], float C[], int N)
{
   float sum0 = 0.0;
   float sum1 = 0.0;
   #pragma omp target map(to: B[:N], C[:N])
   #pragma omp teams num_teams(2)
   {
      int i;

if (omp_get_num_teams() != 2)
abort();

get_offload_dev();

      if (omp_get_team_num() == 0)
      {
         #pragma omp parallel for reduction(+:sum0)
            for (i=0; i<N/2; i++)
               sum0 += B * C;
      }
      else if (omp_get_team_num() == 1)
      {
         #pragma omp parallel for reduction(+:sum1)
            for (i=N/2; i<N; i++)
               sum1 += B * C;
      }
   }
   return sum0 + sum1;
}

[Examp-52.1c]$ icc -V
Intel(R) C Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 15.0.1.133 Build 20141023
Copyright (C) 1985-2014 Intel Corporation. All rights reserved.

[Examp-52.1c]$ icc Examp-52.1c-main.cpp Examp-52.1c.cpp get_offload_dev.cpp -openmp -offload-attribute-target=mic -o Examp-52.1c.x
[Examp-52.1c]$ ./Examp-52.1c.x

get_offload_dev(): running on MIC 0 with 112 threads

ans = 32896
***PASSED****
[Examp-52.1c]$

Patrick

Andrey_Vladimirov · ‎12-10-2014

Thanks, Patrick and Kevin. I know that I can offload team constructs. However, this is not my use case. My interest is in teams on the CPU and in native MIC applications. I was surprised see that the compiler does not allow me to use teams on the CPU and on MIC in native mode even though this feature is completely separate from offload. It would be great to have teams in CPU and native MIC applications, because in applications with nested parallelism, teams allow to set affinity of threads more elegantly than nested OpenMP parallel regions.

Kevin_D_Intel · ‎12-11-2014

I will check with Development regarding any future plans for OpenMP teams support for native apps.

Updated 01/06/2015: Development echoed James’ comment below about TEAMS construct use inside a TARGET construct, that any deviation would be non-standard, thus an Intel-only extension and that we do not plan to support that.

James_C_Intel2 · ‎12-11-2014

The OpenMP 4.0 standard ( http://www.openmp.org/mp-documents/OpenMP4.0.0.pdf ) is clear that TEAMS constructs can only be used closely nested inside a TARGET construct.

If specified, a teams construct must be contained within a target construct. That
target construct must contain no statements or directives outside of the teams
construct.

(Page 88 [pdf page 96] lines 10:12).

Therefore what you are asking for is non-standard and you should not expect it to be implemented.

You can, perhaps, achieve what you want by using a "target if (0)" clause.

OMP_PLACES has nothing to do with TEAMS. It is concerned with specifying affinity, and (in what may be an omission from the standard) there is no specification of how affinity affects TEAMS.

pbkenned1 · ‎12-11-2014

>>>You can, perhaps, achieve what you want by using a "target if (0)" clause.

Just to confirm what James said, 'fall back' processing on the CPU does work for teams:

float dotprod(float B[], float C[], int N)
{
   float sum0 = 0.0;
   float sum1 = 0.0;
   #pragma omp target map(to: B[:N], C[:N]) if(0)
   #pragma omp teams num_teams(2)

[Examp-52.1c]$ icc Examp-52.1c-main.cpp Examp-52.1c.cpp get_offload_dev.cpp -openmp -offload-attribute-target=mic -o Examp-52.1c.x
[Examp-52.1c]$ ./Examp-52.1c.x

get_offload_dev(): running on HOST with 32 threads

ans = 32896
***PASSED****
[Examp-52.1c]$

Patrick

Jeongnim_K_Intel1 · ‎12-11-2014

Instead of using "teams", the same thing can be done with nested OpenMP with placement clauses.

#pragma omp parallel num_threads(2) proc_bind(spread)
{
#pragma omp parallel num_threads(16) proc_bind(close)
{    }
}

Alternatively, OMP environments will do the job.
OMP_NUM_THREADS=2,16
OMP_PROC_BIND=spread,close
OMP_PLACES=cores

Is there any reason to insist on "teams" which does not add any value other than somehow related to GPU programming?

pbkenned1 · ‎12-11-2014

Thanks for the nested parallel example. Nothing wrong with that, but the originator was asking about OMP teams. Yes, #pragma omp teams is specifically for targeting offload devices --- accelerators, coprocessors, GPUs. It's designed to function inside the target construct to create a simple communication path between the host and attached device(s). With teams, you can create a new, fresh, independent instances of OMP thread teams on the device which are completely independent of the spawning threads. This is very different from host nested parallelism, where each nested thread team takes on certain attributes of the spawning thread, for example, its affinity mask. In fact, it can be challenging to get affinity right with nested parallelism, although the new OMP 4.0 affinity mechanisms make this a lot easier.

Patrick

Andrey_Vladimirov · ‎12-12-2014

Thank you all for your responses!

What I need to do (nested parallelism + convenient affinity settings) is perfectly addressed by JEONGNIM K.'s response. I am perfectly happy using nested OpenMP regions rather than teams, and the reason that I originally asked about teams is that I erroneously assumed that OMP_PLACES and OMP_PROC_BIND is only for teams. Thanks to everybody for clearing up my confusion and for the good additional information and references!

OpenMP 4.0 teams on CPU and in native mode