Intel® Moderncode for Parallel Architectures
Support for developing parallel programming applications on Intel® Architecture.

Help with OpenMP

xraygenfit
Beginner
433 Views

Hi everyone, I'm relatively new to OpenMP, and while I have successfully parallelized some loops, I'm stuck on another one


I have the following code (sorry, the C++ formatter isn't working in firefox)

double bestparams[20];
double bestchisquare = FastReflfit(boxes, SLD, parameters, paramsize,
QRange,QSize,Reflectivity,reflectivitysize,Errors,covar,
covarsize, info, infosize, onesigma, FALSE);

omp_set_dynamic(TRUE);
#pragma omp parallel for schedule(guided,50000)
for(int i = 0; i<100000;i++)
{
ParamPermute(params,paramsize);
double chisquare = FastReflfit(boxes, SLD, parameters, paramsize,
QRange,QSize,Reflectivity,reflectivitysize,Errors,covar,
covarsize, info, infosize, onesigma, FALSE);

if(chisquare < bestchisquare)
{
bestchisquare = chisquare;
for(int j = 0; j {
bestparams = params;
}
}
}

So, what I'm basically doing, is calculating a chisquare, comparing it to the previous one, and replacing the previous best fit with the current best fit. Now, if I do a straight omp pragma, I'm going to have a problem with updating the fits simultaneously. If I make them private and have a critical section, I'm going to slow everything else down because each iteration doesn't take that long (but the aggregate is very slow). So, what is the best way to do this? Thanks for any help.

0 Kudos
4 Replies
michaelsuess
Beginner
433 Views
The best way is to implement a reduction manually. This means that you make private copies of your bestchisqare and bestparams for each thread. Each thread now calculates his own best values using these private copies (no need for synchronization, since every thread only operates on private values).

At the end of the parallel region, each thread now compares it's private best values with a shared, global best value. You need to protect this with a critical region, but since this will only be evaluated once for each thread, thats probably OK performance-wise.
0 Kudos
xraygenfit
Beginner
433 Views
Hi Michael,

Thanks for the help. I think I'm a little confused now. If I put in a critical section, that would be executed with each iteration of my parallel calculation right?
So,

#pragma omp parallel for reduction...
for(int i = 0; i< loopcount; i++)
{

//Do a bunch of stuff
//increment reduction variable

#pragma omp critical
{
//Compare my private variables with the global
}
}

Now, won't that be updated with each thread? Sorry, this is really throwing me for a loop. Thanks for any help.
0 Kudos
jimdempseyatthecove
Honored Contributor III
433 Views

Xray,

Outside the parallel loop define a data structure containing the best chi and array of parameters. The prior to the loop declare and initializean array of these data structures, the size of which is the number of threads that will process the inner loop. Outside the parallel loop create an "int MyThreadNum = -1;". Then on the #pragma, declare MyTheadNum as private and as copyin (to initialize with -1). Then inside the loop, if MyThreadNum is < 0 call the library function to obtain the thread number (team member number). Then use this number later in the loop to index the array of structures containing the best chi and parameters. This method of coding gives each thread a private area for chi and parameters (does not require critical section).

After the execution of the prallel loop then scan the array of structures for the lowest chi and copy it's parameters to the global parameters list (and global chi value).

--

An alternate way of doing this would be in defining the structure containing the best chi and array of parameters you declare a constructor that initialized the default local best chi(either to the initial chi or to HUGEchi). Then declare a destructor that has the critical section to copy out (when valid) the local best chi and parameters. In this manner you do not need to create an array of these structures (each is stack local to the threads by declaring an instance of the structure inside the for loop). Use the copyin to pass in a pointer to the one and only copy of the best chi and parameters structure that resides outside the parallel loop.

Jim Dempsey

0 Kudos
xraygenfit
Beginner
433 Views
Very clever! Thanks
0 Kudos
Reply