Solved: issue with intel compiler + openMP

GDN · ‎06-29-2022

Dear all,

I'm getting something strange with Intel Compiler 2021.6.0 20220226 and the following piece of code:

#include <omp.h>
#include <iostream>
using namespace std;

// int main()
int main(int argc, char *argv[])
{
  int iNumClients = 0;
  int totalNumClients = 2;

  // deprecated (can it be commented out?)
  omp_set_nested(1); /// Enable nested parallelism

  omp_set_max_active_levels(2); /// Max two levels of nested parallelism
  omp_set_dynamic(0); /// I take full control

  cout << "totalNumClients =" << totalNumClients << endl;

#pragma omp parallel num_threads(2) shared(iNumClients,totalNumClients)
  {
#pragma omp sections
    {
#pragma omp section
      {
        int tid = omp_get_thread_num();
        cout <<  "DEBUG: listening (thread " << tid << " )" << endl;
        // iNumClients=0;
        while (1) {
          cout << "Enter a number: "<< endl;
          cin >> iNumClients;
          cout << "Your number is: " << iNumClients << endl;
        }
      } // omp section
#pragma omp section
      {
        int tid = omp_get_thread_num();
        cout <<  "DEBUG: Coupling (thread " << tid << " )" << endl;
        while (iNumClients < totalNumClients) {
          // cout << "iNumClients = " << iNumClients << " in thread = " << tid << endl;
        }
        cout << "All clients successfully connected! (thread " << tid << " )" << endl;
      } // omp section
    } // omp sections
  } //  omp parallel

  cout << "Stop openmp threading" << endl;
  return 0;
} // main

- When I compile it with:

icpc -qopenmp test.cpp

The program starts and asks for a value. If I enter 2, the program should print "All clients successfully connected!", but nothing happens!

- When I compile it without optimization:

icpc -O0 -qopenmp test.cpp

The program works as I expect. When I enter 2, I get "All clients successfully connected!".

- When I compile with g++ 4.8 with or without optimization I get also "All clients successfully connected!".

Is it an issue with my current intel compiler? Or is my nasty empty while loop against any standards?

Thx a lot

Regards

Guillaume

Klaus-Dieter_O_Intel · ‎07-01-2022

Please read section "1.4 Memory Model" at https://www.openmp.org/spec-html/5.1/openmpse4.html#x17-160001.4, in particular:

1.4.1 Structure of the OpenMP Memory Model

The OpenMP API provides a relaxed-consistency, shared-memory model. All OpenMP threads have access to a place to store and to retrieve variables, called the memory. A given storage location in the memory may be associated with one or more devices, such that only threads on associated devices have access to it. In addition, each thread is allowed to have its own temporary view of the memory. The temporary view of memory for each thread is not a required part of the OpenMP memory model, but can represent any kind of intervening structure, such as machine registers, cache, or other local storage, between the thread and the memory. The temporary view of memory allows the thread to cache variables and thereby to avoid going to memory for every reference to a variable. Each thread also has access to another type of memory that must not be accessed by other threads, called threadprivate memory.

1.4.4 The Flush Operation

The memory model has relaxed-consistency because a thread’s temporary view of memory is not required to be consistent with memory at all times. A value written to a variable can remain in the thread’s temporary view until it is forced to memory at a later time. Likewise, a read from a variable may retrieve the value from the thread’s temporary view, unless it is forced to read from memory. OpenMP flush operations are used to enforce consistency between a thread’s temporary view of memory and memory, or between multiple threads’ view of memory.

Any optimization may rely on the relaxed-consistency, shared memory model, and "-O2 is the implicit default of the Intel compiler. Inserting flush operations solves the issue:

#pragma omp section

{

int tid = omp_get_thread_num();

cout << "DEBUG: listening (thread " << tid << " )" << endl;

// iNumClients=0;

while (1) {

cout << "Enter a number: "<< endl;

cin >> iNumClients;

cout << "Your number is: " << iNumClients << endl;

#pragma omp flush(iNumClients)

}

} // omp section

#pragma omp section

{

int tid = omp_get_thread_num();

cout << "DEBUG: Coupling (thread " << tid << " )" << endl;

while (iNumClients < totalNumClients) {

// cout << "iNumClients = " << iNumClients << " in thread = " << tid << endl;

#pragma omp flush(iNumClients)

}

cout << "All clients successfully connected! (thread " << tid << " )" << endl;

} // omp section

View solution in original post

HemanthCH_Intel · ‎06-30-2022

Hi,

Thanks for posting in Intel Communities.

We are able to reproduce your issue at our end on Ubuntu 18.04 machine. We are working on your issue internally and get back to you soon.

Thanks & Regards,

Hemanth

Klaus-Dieter_O_Intel · ‎07-01-2022

Please read section "1.4 Memory Model" at https://www.openmp.org/spec-html/5.1/openmpse4.html#x17-160001.4, in particular:

1.4.1 Structure of the OpenMP Memory Model

The OpenMP API provides a relaxed-consistency, shared-memory model. All OpenMP threads have access to a place to store and to retrieve variables, called the memory. A given storage location in the memory may be associated with one or more devices, such that only threads on associated devices have access to it. In addition, each thread is allowed to have its own temporary view of the memory. The temporary view of memory for each thread is not a required part of the OpenMP memory model, but can represent any kind of intervening structure, such as machine registers, cache, or other local storage, between the thread and the memory. The temporary view of memory allows the thread to cache variables and thereby to avoid going to memory for every reference to a variable. Each thread also has access to another type of memory that must not be accessed by other threads, called threadprivate memory.

1.4.4 The Flush Operation

The memory model has relaxed-consistency because a thread’s temporary view of memory is not required to be consistent with memory at all times. A value written to a variable can remain in the thread’s temporary view until it is forced to memory at a later time. Likewise, a read from a variable may retrieve the value from the thread’s temporary view, unless it is forced to read from memory. OpenMP flush operations are used to enforce consistency between a thread’s temporary view of memory and memory, or between multiple threads’ view of memory.

Any optimization may rely on the relaxed-consistency, shared memory model, and "-O2 is the implicit default of the Intel compiler. Inserting flush operations solves the issue:

#pragma omp section

{

int tid = omp_get_thread_num();

cout << "DEBUG: listening (thread " << tid << " )" << endl;

// iNumClients=0;

while (1) {

cout << "Enter a number: "<< endl;

cin >> iNumClients;

cout << "Your number is: " << iNumClients << endl;

#pragma omp flush(iNumClients)

}

} // omp section

#pragma omp section

{

int tid = omp_get_thread_num();

cout << "DEBUG: Coupling (thread " << tid << " )" << endl;

while (iNumClients < totalNumClients) {

// cout << "iNumClients = " << iNumClients << " in thread = " << tid << endl;

#pragma omp flush(iNumClients)

}

cout << "All clients successfully connected! (thread " << tid << " )" << endl;

} // omp section

GDN · ‎07-03-2022

Thx for this explanation!

Have a nice day