Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Intel Community
- Software Products
- Software Archive
- Xeon Phi - OpenMP Teams

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

Highlighted
##

Hi. I have some problem. I write aplication for Intel Xeon Phi (61 cores), which does stencil calculation using 2D matrix (five-point stencil). I would like to use OpenMP 4.0 teams. I would like to create teams which consist of 4 threads running on each core for example Team 1 - threads 1,2,3,4, Team 2 - threads 5,6,7,8 ect, because i try reduce cache miss by doing caculation for 4 threads around the same L2 cache. I tried fix it by set KMP_AFFINITY="proclist=[1,5,9,...,237,2,3,4,6,7,8,10,11,12,...,238,239,240],explicit". This affinity work for a small count of teams. Is any way to set affinity, which solve my problem? Thanks.

Jan_K_

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

09-07-2014
02:31 PM

2 Views

Xeon Phi - OpenMP Teams

9 Replies

Highlighted
##

jimdempseyatthecove

Black Belt

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

09-07-2014
04:38 PM

2 Views

Check out a series of blogs

Check out a series of blogs on IDZ that I wrote on this subject

https://software.intel.com/en-us/search/site/language/en?query=Chronicles

This is a 5-part blog. It is beneficial to read, or at least scan over, all 5 parts.

The blogs code were run on a 5110P with 60 cores.

Jim Dempsey

Highlighted
##

Jan_K_

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

09-11-2014
04:47 PM

2 Views

OK. I will read it. I have

OK. I will read it. I have next question about teams.

int main() { #pragma omp target { #pragma omp teams num_teams(60) num_threads(4) { printf("Teams: %d, Target 1\n", omp_get_team_num()); } } #pragma omp target { #pragma omp teams num_teams(60) num_threads(4) { printf("Teams: %d, Target 2\n", omp_get_team_num()); } } return 0; }

In first target each team write his number, in second target only first team. Why?

Highlighted
##

jimdempseyatthecove

Black Belt

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

09-12-2014
10:03 AM

2 Views

Jan,

Jan,

Could you please post your private message to me here such that others can see it, and comment as well.

The only thing that stands out is your thread work partioning scheme can produce a higher degree of unbalanced loads than the traditional omp for loop.

The omp loop divides the universe by the number of workers .AND. takes the remainder of the division.

Each thread thread then receive the divided value + 1 if there worker number is less than the remainder.

This way any disparity between threads is at worst 1 additional iteration.

In your technique, the worst case could potentially have the last thread receiving:

uninverse / number of workers + (number of workers - 1) iterations.

IOW you are lumping the computational burden of the remainder onto the last worker.

Jim Dempsey

Highlighted
##

Jan_K_

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

09-13-2014
03:33 AM

2 Views

Ok. Thanks. In understand

Ok. Thanks. I understand what is wrong. What about my prievious post?

Highlighted
##

jimdempseyatthecove

Black Belt

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

09-14-2014
07:56 AM

2 Views

On your #3 I do not know the

On your #3 I do not know the reason.

What happens when you explicitly state the target is mic0?

What happens when you explicitly state the target is mic0 .AND. state num_teams(30) for both offloads?

Jim Dempsey

Highlighted
##

I cannot reproduce your problem. Here is a section of the output from my run
.......
Teams: 55, Target 0
Teams: 15, Target 0
Teams: 35, Target 0
Teams: 42, Target 0
Teams: 31, Target 0
Teams: 44, Target 0
Teams: 46, Target 0
Teams: 43, Target 0
Teams: 47, Target 0
Teams: 59, Target 0
Teams: 0, Target 1
Teams: 16, Target 1
Teams: 8, Target 1
Teams: 48, Target 1
Teams: 32, Target 1
Teams: 1, Target 1
Teams: 4, Target 1
Teams: 2, Target 1
Teams: 12, Target 1
Teams: 3, Target 1
Teams: 18, Target 1
Teams: 9, Target 1
.........

Ravi_N_Intel

Employee

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

09-15-2014
10:05 AM

2 Views

I cannot reproduce your

Highlighted
##

Jan_K_

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

09-15-2014
12:30 PM

2 Views

It's strange.I ran this code

It's strange.I ran this code 10 times and i got the same effect as before.

Highlighted
##

Ronald_G_Intel

Moderator

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

09-15-2014
12:41 PM

2 Views

Jan,

Jan,

what compiler version are you using? OMP 4 was rolled into the compilers over time. with the latest 15.0 it's complete, except for user-defined reductions.

Is it possible you have an older ( > 6 months old ) compiler? For OMP 4 you'll want 15.0

ron

Highlighted
##

Jan_K_

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

09-16-2014
01:26 AM

2 Views

Yes, i have older compiler.

Yes, i have older compiler. Version 14.0.1.106 Build 20131008.

For more complete information about compiler optimizations, see our Optimization Notice.