When you start a parallel

MooN_K_ · ‎10-09-2013

Hello OpenMp professionals

I m working in the parallel region with openmp, and i get a random thread's ID (not in order) example for a number of threads =4, so i get Thread's ID =1,Thread's ID =3,Thread's ID =2,Thread's ID =0, and for another execution i get another order.How to get the order of IDs eq to 0, 1, 2, 3. Any satisfactory answer would be welcome.

Here is my code:

int nThread = omp_get_max_threads ();

#pragma omp parallel num_threads(nThread)
{
int myID=omp_get_thread_num ();

printf("Thread's ID %d \n", myID);

}

Thanks for your reply

TimP · ‎10-10-2013

In case you're going about this by trial and error (understandable, given the scattered documentation on OpenMP), you must compare what you got with saner variants such as

#pragma omp parallel num_threads(nThread)
{

#pragma omp single

{
int myID=omp_get_thread_num ();

printf("Thread's ID %d \n", myID);

}

I would agree that the IBM doc (but not the Microsoft one) would appear to justify your way of doing it.

Vladimir_P_1234567890 · ‎10-10-2013

Moreover, without this pragma (#pragma omp single) you can get mixed output like below since you are running a parallel program

Thread's IDThread's ID 1 0
ThreThread's ID 2
ad's ID 3

--Vladimir

jimdempseyatthecove · ‎10-10-2013

When you start a parallel region, it is like a horse race. IOW the hourses (threads) can run in any order. Thus you may see

0,1,2,3
3,2,1,0
... (any permution of order)

This is the whole idea of running in parallel (which does not mean lock-step).

If you require sections of code to run in sequential order (e.g. writing results to output file), then consider having the parallel section produce the results into internal storage, then following the parallel section, have a sequential section that writes the results to output. You can do other more complicated things:

[cpp]

int nThread = omp_get_max_threads ();
ASSERT(nThread < MAX_THREADS_YOU_SPECIFY);
volatile int doneFlags[MAX_THREADS_YOU_SPECIFY];

for(int i=0; i < nThread; ++i)
doneFlags = 0;

#pragma omp parallel num_threads(nThread)
{
int myID=omp_get_thread_num ();
// partition work nThread-ways
// assign me to myId partition
doComputeWorkHere(nThread, myId); // in any order
doneFlags[myID] = 1; // indicate myID is done
if(myID == 0)
{
    for(int i=0; i < nThread; ++i)
    {
      while(doneFlags == 0)
        _mm_pause();

      printf("Outputting Thread's ID %d data\n", i);
      outputDataHere(i);
    } // for
} // if(myID == 0)
} // omp parallel
[/cpp]

Note, the above code example is not normal programming practice. Normal programming practice would divide the work evenly, then place the output section after the parallel region. This would also eliminate the need for the doneFlags and code initializing, setting, and testing.

Now, you might ask why you would want to program in the manner as described above, and this would be a good question.

a) You design your partitioning in a manner that each thread (myID) has different amount of work. The work is proportional to the myID number:

work = X + Y * myID

Thus when ID 0 finishes, it writes its data to file. While writing, IDs 1, 2, 3 continue working. If you set the Y correctly, ID 1 will finish work at the moment the write completes for ID 0. ID 2 finishes work at the moment the write for ID 1 finishes, ID 3 finishes at the moment the write for ID 2 finishes. Thusly, you can recover the latency of writing sections 0, 1, 2 (from a 4 thread asymectric work load).

b) You can place a loop in the parallel region. Then do something like this:

[cpp]

int nThread = omp_get_max_threads ();
volatile int writerID = 0;

#pragma omp parallel num_threads(nThread)
{
int myID=omp_get_thread_num ();
for(int chunk = 0; chunk < nChunks; ++chunk)
{
    // partition work nThread-ways
    // assign me to myId partition
    doComputeWorkHere(nThread, myId, chunk); // in any order
    while(writerFlag != myID)
      _mm_pause(); // or Sleep(0)
    printf("Outputting Thread's ID %d data\n", i);
    outputDataHere(i);
    writerFlag = (writerFlag + 1) % nThread;
} // for
} // omp parallel
[/cpp]

Jim Dempsey

omp_get_thread_num() returns random values in the parallel region