omp_get_thread_num() returns random values in the parallel region

MooN_K_ — Thu, 10 Oct 2013 04:59:17 GMT

Hello OpenMp professionals

I m working in the parallel region with openmp, and i get a random thread's ID (not in order) example for a number of threads =4, so i get Thread's ID =1,Thread's ID =3,Thread's ID =2,Thread's ID =0, and for another execution i get another order.How to get the order of IDs eq to 0, 1, 2, 3. Any satisfactory answer would be welcome.

Here is my code:

int nThread = omp_get_max_threads ();

#pragma omp parallel num_threads(nThread)
{
int myID=omp_get_thread_num ();

printf("Thread's ID %d \n", myID);

}

Thanks for your reply

In case you're going about

TimP — Thu, 10 Oct 2013 12:35:19 GMT

In case you're going about this by trial and error (understandable, given the scattered documentation on OpenMP), you must compare what you got with saner variants such as

#pragma omp parallel num_threads(nThread)
{

#pragma omp single

{
int myID=omp_get_thread_num ();

printf("Thread's ID %d \n", myID);

}

I would agree that the IBM doc (but not the Microsoft one) would appear to justify your way of doing it.

Moreover, you can get mixed

Vladimir_P_1234567890 — Thu, 10 Oct 2013 12:53:00 GMT

Moreover, without this pragma (#pragma omp single) you can get mixed output like below since you are running a parallel program

Thread's IDThread's ID 1 0
ThreThread's ID 2
ad's ID 3

--Vladimir

When you start a parallel

jimdempseyatthecove — Thu, 10 Oct 2013 16:02:00 GMT

When you start a parallel region, it is like a horse race. IOW the hourses (threads) can run in any order. Thus you may see

0,1,2,3
3,2,1,0
... (any permution of order)

This is the whole idea of running in parallel (which does not mean lock-step).

If you require sections of code to run in sequential order (e.g. writing results to output file), then consider having the parallel section produce the results into internal storage, then following the parallel section, have a sequential section that writes the results to output. You can do other more complicated things:

[cpp]

int nThread = omp_get_max_threads ();
ASSERT(nThread < MAX_THREADS_YOU_SPECIFY);
volatile int doneFlags[MAX_THREADS_YOU_SPECIFY];

for(int i=0; i < nThread; ++i)
doneFlags = 0;

#pragma omp parallel num_threads(nThread)
{
int myID=omp_get_thread_num ();
// partition work nThread-ways
// assign me to myId partition
doComputeWorkHere(nThread, myId); // in any order
doneFlags[myID] = 1; // indicate myID is done
if(myID == 0)
{
    for(int i=0; i < nThread; ++i)
    {
      while(doneFlags == 0)
        _mm_pause();

      printf("Outputting Thread's ID %d data\n", i);
      outputDataHere(i);
    } // for
} // if(myID == 0)
} // omp parallel
[/cpp]

Note, the above code example is not normal programming practice. Normal programming practice would divide the work evenly, then place the output section after the parallel region. This would also eliminate the need for the doneFlags and code initializing, setting, and testing.

Now, you might ask why you would want to program in the manner as described above, and this would be a good question.

a) You design your partitioning in a manner that each thread (myID) has different amount of work. The work is proportional to the myID number:

work = X + Y * myID

Thus when ID 0 finishes, it writes its data to file. While writing, IDs 1, 2, 3 continue working. If you set the Y correctly, ID 1 will finish work at the moment the write completes for ID 0. ID 2 finishes work at the moment the write for ID 1 finishes, ID 3 finishes at the moment the write for ID 2 finishes. Thusly, you can recover the latency of writing sections 0, 1, 2 (from a 4 thread asymectric work load).

b) You can place a loop in the parallel region. Then do something like this:

[cpp]

int nThread = omp_get_max_threads ();
volatile int writerID = 0;

#pragma omp parallel num_threads(nThread)
{
int myID=omp_get_thread_num ();
for(int chunk = 0; chunk < nChunks; ++chunk)
{
    // partition work nThread-ways
    // assign me to myId partition
    doComputeWorkHere(nThread, myId, chunk); // in any order
    while(writerFlag != myID)
      _mm_pause(); // or Sleep(0)
    printf("Outputting Thread's ID %d data\n", i);
    outputDataHere(i);
    writerFlag = (writerFlag + 1) % nThread;
} // for
} // omp parallel
[/cpp]

Jim Dempsey

topic In case you're going about in Intel® Moderncode for Parallel Architectures

omp_get_thread_num() returns random values in the parallel region

In case you're going about

Moreover, you can get mixed

When you start a parallel