Intel® Moderncode for Parallel Architectures
Support for developing parallel programming applications on Intel® Architecture.
1696 Discussions

Core assign and parallel processing don't work when using Intel openMP Library(i.e., libiomp5)

Themis_Jeon
Beginner
1,006 Views

Issue : Core assign and parallel processing don't work when using Intel openMP Library(i.e., libiomp5)

Compiler : GCC

OS : RHEL 6.4, 6.6, 7.7, and 8.4 with Intel OneAPI 2022 installed

==============================================================================

Hello.

 

I faced with some difficult that core assign and parallel processing on my code don't work well when I using Intel openMP library, i.e., libiomp5.

 

Since I should use Intel MKL Library like specially libmkl_intel_thread.a and libmkl_intel_core.a, it is essential using libiomp5.so.

In addition, I should use #pragma omp parallel processing with core assign such as CPU_SET and pthread_setafficity.

 

When I tested my test code as using libiomp5, the figure of core load(result of top command on the shell) looks so strange.

My test code generates two threads those first thread used core#0, #2 and #4 and others used core#1, #3 and #5. If normal work, core#0 ~ #5 are under 100% load because two threads are on the infinite loop without sleep. However only core#0, #2 and #4 are under 100% load.

Strangely, that doesn't mean the thread assigned on core#1, #3 and #5 was not created. This thread was made but cannot check CPU load.

One more interesting thing is all cores work well when I changed iomp5 as gomp. As following 2 pictures are core load statement when I use iomp5 and gomp separately.

 

result with iomp5result with iomp5

iomp result : only core#0, #2 and #4 are under 100% load but two threads works.

compile command : gcc Test_iomp5.c -o Test_iomp5.out -D_GNU_SOURCE -fopenmp -ldl -liomp5 -lpthread -L/opt/intel/oneapi/compiler/2022.0.0/linux/compiler/lib/intel64

 

result with gompresult with gomp

gomp result : core#0 ~ #5 are under 100% load.

compile command : gcc Test_iomp5.c -o Test_iomp5.out -D_GNU_SOURCE -fopenmp -ldl -gomp -lpthread -L/opt/intel/oneapi/compiler/2022.0.0/linux/compiler/lib/intel64

 

Such condition, what should I do to fix this problem?

Specially, I found the way to solve this problem that is using gomp as well as iomp5 as following compile command but I am afraid compatibility between gomp and iomp5 when use both.

compile command : gcc Test_iomp5.c -o Test_iomp5.out -D_GNU_SOURCE -fopenmp -ldl -gomp -iomp5 -lpthread -L/opt/intel/oneapi/compiler/2022.0.0/linux/compiler/lib/intel64

 

I want to know

1) How can I use #pragma omp parallel processing on the multi threads when I used iomp5?

2) If I should use both gomp and iomp5, is that no problem in terms of compatibility each other?

 

Please let me solve this issues, Sincerely.

If any question and to do test, do not hesitate on reply

 

============= My Test Code : Test_iomp5.c ============

#include <stdio.h>
#include <pthread.h>
#include <sched.h>
#include <omp.h>

#define MAX_THREAD_NUM 2
#define MAX_OMP_NUM 6

extern void* DoWork1(void *args)
{
  long lLoop;

  #pragma omp parallel for private(lLoop) num_threads(MAX_OMP_NUM)
  for(lLoop = 0 ; lLoop < 10000000000 ; lLoop++)
  {
     lLoop = 0;
  }

  printf("Thread 1 Done @ core 0 & 2 & 4\n");
  return NULL;
}

extern void* DoWork2(void *args)
{
  long lLoop;

  #pragma omp parallel for private(lLoop) num_threads(MAX_OMP_NUM)
  for(lLoop = 0 ; lLoop < 10000000000 ; lLoop++)
  {
    lLoop = 0;
  }

  printf("Thread 2 Done @ core 1 & 3 & 5\n");
  return NULL;
}

int main(int const argc, char *const argv)
{

  int iLoop;

  pthread_t threads[MAX_THREAD_NUM];
  pthread_attr_t attr;

  (void) pthread_attr_init(&attr);

  for(iLoop = 0 ; iLoop < MAX_THREAD_NUM ; iLoop++)
  {
    cpu_set_t mask;
    CPU_ZERO(&mask);

    if(iLoop == 1)
    {
     CPU_SET(1, &mask);
     CPU_SET(3, &mask);
     CPU_SET(5, &mask);

     (void) pthread_attr_setaffinity_np(&attr, sizeof(cpu_set_t), &mask);
     (void) pthread_create(&threads[iLoop], &attr, DoWork2, (void *)iLoop);
     (void) printf("Thread %d CoreSetMask 0x%X\n", iLoop+1, mask.__bits[0]);
   }
   else
  {
     CPU_SET(0, &mask);
     CPU_SET(2, &mask);
     CPU_SET(4, &mask);

     (void) pthread_attr_setaffinity_np(&attr, sizeof(cpu_set_t), &mask);
     (void) pthread_create(&threads[iLoop], &attr, DoWork1, (void *)iLoop);
     (void) printf("Thread %d CoreSetMask 0x%X\n", iLoop+1, mask.__bits[0]);
   }
  }

  for(iLoop = 0 ; iLoop < MAX_THREAD_NUM ; iLoop++)
  {
    (void) pthread_join(threads[iLoop], NULL);
  }
}

0 Kudos
0 Replies
Reply