Re: Problem with parallel_for

uurmp · ‎10-12-2008

I downloaded TBB (tbb21_20080605oss) from this site. and use it in VC++2005.
my main code is as follows:
void main(){
...
parallel_for(blocked_range(0, size0, gainsize0), gene_phantom);
...
for(int iter=0, iter {

...
parallel_for(blocked_range(0, size1, gainsize1), frontback_phantom);

...

parallel_for(blocked_range(0, size2, gainsize2), curve_fit);

....

}//end of iter

}

my problem are;

1: the first parallel_for works well for any grainsize

2: the second parallel_for works when grainsize larger than some value.

and whengrainsize is small after iter=5 this parallel_for stop and gives error :

First-chance exception at 0x7c812a5b in Main_iterate_fitting_parallel2D.exe:

Microsoft C++ exception: std::bad_alloc at memory location 0x0012f8a8..

Unhandled exception at 0x7c812a5b in Main_iterate_fitting_parallel2D.exe:

Microsoft C++ exception: tbb::captured_exception at memory location 0x0012f9b8..

3: the thirdparallel_for only works when grainsize2=size2; whenever iter is what number.

(here works means Itgenerates right result.

I'm a beginner of parallel coding.

Anysuggestion about the problem will greatly appriciated!)

here are the parallel class I defined and how I call it:

class curve_fitting {

public

:

int *imageExtent; int *volumeExtent; int nl;

int mm; int n;int k;

double *ZTaa0 ;double *ZTbb0 ;double *ZTcc0 ;

double *Zaa0 ; double *Zbb0 ;double *Zcc0 ;

double *origin; double *x,*sig;

double (*ZTDvolume)[320];

void operator()(const blocked_range<int>& range) const

{

fstream outdata;

Vec_DP a(n),x0(nl),y(nl),sig0(nl);

Mat_DP covar(n,n), alpha(n,n);

DP chisq, alamda;

Vec_BOOL ia(n);

ia[0] =

true ; ia[1] = true; ia[2] = true;

for ( int kk=0; kk

{

x0[kk]=x[kk]; sig0[kk]=sig[kk];

}

for(int j = range.begin(); j

{

int ny = j*volumeExtent[0];

for ( int i=0; i

{

for(int kv=0;kv

{

y[kv] = ZTDvolume[ny+i][kv];

}

double temp=(double)((i-origin[0]+0.5)*(i-origin[0]+0.5)+2.0*(j-origin[1]+0.5)*

(j-origin[1]+0.5)+(k-origin[2]+0.5)*(k-origin[2]+0.5));

if (temp<=800.0)

{

a[0] = ZTaa0[ny+i]; a[1] = ZTbb0[ny+i]; a[2] = ZTcc0[ny+i];for (int iter=0;iter<100;iter++)

alamda = -1;

{

NR::mrqmin(x0,y,sig0,a,ia,covar,alpha,chisq,NR::mfgauss,alamda);

}

Zaa0[ny+i]=a[0]; Zbb0[ny+i]=a[1]; Zcc0[ny+i]=a[2];

}

else

{

Zaa0[ny+i]=0.0; Zbb0[ny+i]=0.5; Zcc0[ny+i]=0.1;

}

};

/////

void

task_scheduler_init init;

main(){

...

curve_fitting curve_fit;

curve_fit.imageExtent = imageExtent;

curve_fit.volumeExtent = volumeExtent;

curve_fit.nl = nl; curve_fit.mm = mm;

curve_fit.n = n; curve_fit.x = x;

curve_fit.sig = sig; curve_fit.k = k;

curve_fit.origin = origin;curve_fit.ZTaa0 = ZTaa0;

curve_fit.ZTbb0 = ZTbb0; curve_fit.ZTcc0 = ZTcc0;

curve_fit.Zaa0 = Zaa0; curve_fit.Zbb0 = Zbb0;

curve_fit.Zcc0 = Zcc0;curve_fit.ZTDvolume = ZTDvolume;

parallel_for(blocked_range<int>(0,64,64),curve_fit);

...

}

Andrey_Marochko · ‎10-13-2008

The fact that a bad_alloc exception is generated by the runtime while your own code apparently does not allocate memory inside calculation loops suggests that you somehow overrides memory that does not belong to you. Make sure that you code does not access your arrays beyond their bounds, and that all the arrays are properly allocated.

Alexey-Kukanov · ‎10-13-2008

Also, try debugging in serial mode: initialize TBB with a single thread via task_scheduler_init init(1); and see if this works.

uurmp · ‎10-13-2008

Quoting - Andrey Marochko (Intel)

The fact that a bad_alloc exception is generated by the runtime while your own code apparently does not allocate memory inside calculation loops suggests that you somehow overrides memory that does not belong to you. Make sure that you code does not access your arrays beyond their bounds, and that all the arrays are properly allocated.

Thank you for your advice!

Actually, I just start to write code in C++ a month ago when I knew your libary TBB. I don't have deep sense about how to allocate memory correctly. I know the arrays are in pretty big size. it's easy to have memory allocation problem.

What's your suggestion is avoiding using public members to share variables? Using function type to share variables is better?

Thanks!

uurmp · ‎10-15-2008

Quoting - Alexey Kukanov (Intel)

Also, try debugging in serial mode: initialize TBB with a single thread via task_scheduler_init init(1); and see if this works.

1. for the third parallel_for, which I posted:
Yes, it works. Is it the same as setting grainsize= size, which means only one grain I have.

And I triedseting grainsize= size in multithread case, it also works.

However,

2. for second parallel for:

when Iuse serial mode as you suggested and set grainsize = 5 ( while grainsize =10, code works),

the code stopsbecause ofthe same problem as front , when iter goes to 8.

3. It'shardfor me to understand why the parallel_for works for first few iterations and stops. If because of memory allocation,

why the first iteration allocate correct, and can repeat 6 times,then did not work when try thr 8th repeating.

Thank you very much!

uurmp · ‎10-15-2008

Newly post, hope it more readable

I downloaded TBB (tbb21_20080605oss) from this site. and use it in VC++2005.
my main code is as follows:
void main()
{
...
parallel_for(blocked_range<int>(0, size0, gainsize0), gene_phantom);
...
for(int iter=0, iter{
...
parallel_for(blocked_range<int>(0, size1, gainsize1), frontback_phantom);
...
parallel_for(blocked_range<int>(0, size2, gainsize2), curve_fit);
....
}//end of iter
}

my problem are;

1: the first parallel_for works well for any grainsize

2: the second parallel_for works when grainsize larger than some value.

and when grainsize is small after iter=5 this parallel_for stop and gives error :

First-chance exception at 0x7c812a5b in Main_iterate_fitting_parallel2D.exe:

Microsoft C++ exception: std::bad_alloc at memory location 0x0012f8a8..

Unhandled exception at 0x7c812a5b in Main_iterate_fitting_parallel2D.exe:

Microsoft C++ exception: tbb::captured_exception at memory location 0x0012f9b8..

3: the third parallel_for only works when grainsize2=size2; whenever iter is what number.

(here works means It generates right result.

I'm a beginner of parallel coding.

Any suggestion about the problem will highly appriciated!)

here are the parallel class I defined and how I call it:

class curve_fitting
{

public:
int *volumeExtent;
int nl, n, kz
double *ZTaa0, *ZTbb0, *ZTcc0 ;
double *Zaa0, *Zbb0, *Zcc0 ;
double *origin, *x,*sig;
double (*ZTDvolume)[320];
void operator()(const blocked_range<int>& range) const
{
Vec_DP a(3), y(nl), x0(nl),sig0(nl);
DP chisq, alamda, temp;
Mat_DP covar(n,n), alpha(n,n);
Vec_BOOL ia(n);
ia[0] = true;
ia[1] = true;
ia[2] = true;
for (int kk=0; kk {
x0[kk]=x[kk];
sig0[kk]=sig[kk];
}
for (int kk=0; kk<3; kk++)
{
for (int kt=0; kt<3; kt++)
{
covar[kk][kt]=0.0;
alpha[kk][kt]=0.0;
}
}
for( int j = range.begin(); j {
for ( int i=0; i<64;i++)
{
temp=((i-origin[0]+0.5)*(i-origin[0]+0.5)+2.0*(j-origin[1]+0.5)*
(j-origin[1]+0.5)+(kz-origin[2]+0.5)*(kz-origin[2]+0.5));
if (temp<=800.0)
{
cout<<"kz: "<<<" j: "<<<" i: "<< a[0] = ZTaa0[j*64+i];
a[1] = ZTbb0[j*64+i];
a[2] = ZTcc0[j*64+i];
for( int kv=0;kv<320;kv++)
{
y[kv] = ZTDvolume[j*64+i][kv];
}
chisq=0;
alamda = -1;
for ( int iter=0;iter<100;iter++)
{
//cout<<"iter: "<< NR::mrqmin(x0,y,sig0,a,ia,covar,alpha,chisq,NR::mfgauss,alamda);
}
Zaa0[j*64+i]=a[0];
Zbb0[j*64+i]=a[1];
Zcc0[j*64+i]=a[2];
}
else
{
Zaa0[j*64+i]=0.0;
Zbb0[j*64+i]=0.5;
Zcc0[j*64+i]=0.1;
}
}//End of i( x-axis)
}//End of j( y-axis)

}//End of operator
};// End class curve_fitting /////////////////////

void main()
{
task_scheduler_init init;

...

curve_fitting curve_fit;
curve_fit.imageExtent = imageExtent;
curve_fit.volumeExtent = volumeExtent;
curve_fit.nl = nl; curve_fit.mm = mm;
curve_fit.n = n; curve_fit.x = x;
curve_fit.sig = sig; curve_fit.k = k;
curve_fit.origin = origin; curve_fit.ZTaa0 = ZTaa0;
curve_fit.ZTbb0 = ZTbb0; curve_fit.ZTcc0 = ZTcc0;
curve_fit.Zaa0 = Zaa0; curve_fit.Zbb0 = Zbb0;
curve_fit.Zcc0 = Zcc0; curve_fit.ZTDvolume = ZTDvolume;

parallel_for(blocked_range<int>(0,64,64),curve_fit);

...

}

robert-reed · ‎10-15-2008

Quoting - uurmp

Newly post, hope it more readable

I downloaded TBB (tbb21_20080605oss) from this site. and use it in VC++2005.
my main code is as follows:

for( int j = range.begin(); j {
for ( int i=0; i<64;i++)
{
temp=((i-origin[0]+0.5)*(i-origin[0]+0.5)+2.0*(j-origin[1]+0.5)*
(j-origin[1]+0.5)+(kz-origin[2]+0.5)*(kz-origin[2]+0.5));
if (temp<=800.0)
{
cout<<"kz: "<<<" j: "<<<" i: "<< a[0] = ZTaa0[j*64+i];
a[1] = ZTbb0[j*64+i];
a[2] = ZTcc0[j*64+i];
for( int kv=0;kv<320;kv++)
{
y[kv] = ZTDvolume[j*64+i][kv];
}
chisq=0;
alamda = -1;
for ( int iter=0;iter<100;iter++)
{
//cout<<"iter: "<< NR::mrqmin(x0,y,sig0,a,ia,covar,alpha,chisq,NR::mfgauss,alamda);

Here's a question: do you know whether your implementation of NR::mrqmin is thread safe? I did a quick search for that function name and came up with this example of a Numerical Recipes nonlinear function minimization code. This example shows several local variables that are declared as static (global) which could have all kinds of weird side effects if you're trying to call a similar version multiple times simultaneously.

uurmp · ‎10-16-2008

Quoting - Robert Reed (Intel)

Here's a question: do you know whether your implementation of NR::mrqmin is thread safe? I did a quick search for that function name and came up with this example of a Numerical Recipes nonlinear function minimization code. This example shows several local variables that are declared as static (global) which could have all kinds of weird side effects if you're trying to call a similar version multiple times simultaneously.

Thank you so much!
Yes, I use the Numerical recipe function. If I comment the NR::mrqmin, the code run smoothly. sometimes, the error is "first chance exception...", sometimes the error is" gaussj, singlar matrix".

i'm not sure NR::mrqmin is thread safe, and could you tell me how tocheck a implementation thread safe?

can I just change the definition static to other local definition?

Thank you for your check the NR codes for me.

robert-reed · ‎10-16-2008

Quoting - uurmp

Thank you so much!
Yes, I use the Numerical recipe function. If I comment the NR::mrqmin, the code run smoothly. sometimes, the error is "first chance exception...", sometimes the error is" gaussj, singlar matrix".
i'm not sure NR::mrqmin is thread safe, and could you tell me how tocheck a implementation thread safe?

can I just change the definition static to other local definition?

Thank you for your check the NR codes for me.

Well, I think you've already performed the first step in determining mrqmin's thread safety: if when you cut it out it runs smoother, the piece you cut out probably isn't thread safe :-)

Writing thread-safe code is a skill and an artform that has volumes written about it, and that we are trying to evangelize about. Thus telling you all you need to know about thread safety is probably too much for this one post. I'll try to give you some hints and suggest that you become familiar with this forum, which contains many stories of people dealing with thread safety issues.

I assume these variables in mrqmin were declared static for a reason, which you will need to understand in order to make a thread-safe version. Removing the static specifier will guarantee each invocation of the function will have their own private copy, but it also means that each invocation will create a new copy, which will be discarded when the function returns. Another thing that the static specifier provides is persistence. Is there a context that needs to be maintained from call to call? Removing the static may break the solver.

What you're shooting for is to preserve the algorithm performed by the function while making it reentrant. (The Wikipeida article just linked contains some requirements for reentrancy.)

I'm not sure of your reference in the statement, "sometimes, the error is 'first chance exception...',..." Are there other Numerical Recipes functions your code calls that might also have thread safety issues?

uurmp · ‎10-17-2008

Quoting - Robert Reed (Intel)

Well, I think you've already performed the first step in determining mrqmin's thread safety: if when you cut it out it runs smoother, the piece you cut out probably isn't thread safe :-)

Writing thread-safe code is a skill and an artform that has volumes written about it, and that we are trying to evangelize about. Thus telling you all you need to know about thread safety is probably too much for this one post. I'll try to give you some hints and suggest that you become familiar with this forum, which contains many stories of people dealing with thread safety issues.

I assume these variables in mrqmin were declared static for a reason, which you will need to understand in order to make a thread-safe version. Removing the static specifier will guarantee each invocation of the function will have their own private copy, but it also means that each invocation will create a new copy, which will be discarded when the function returns. Another thing that the static specifier provides is persistence. Is there a context that needs to be maintained from call to call? Removing the static may break the solver.

What you're shooting for is to preserve the algorithm performed by the function while making it reentrant. (The Wikipeida article just linked contains some requirements for reentrancy.)

I'm not sure of your reference in the statement, "sometimes, the error is 'first chance exception...',..." Are there other Numerical Recipes functions your code calls that might also have thread safety issues?

Thank you Robert!

The link you gave me is really helpful. I got some sense about thread-safty. I think the easy way for me is to make thecode a reentrant one, so it should be a thread safty one also.

I think I also need to checkthe functions mrqmin called.

I just use mrqmin of Numerical Recipes, and it call other three function. It's the first time I use the NR functions.

Sorry about my description of error message inlast post.

I means the pop-out error message changesif I debug the code several times with grainsize < size. but both because of the NR::mrqmin().

uurmp · ‎10-19-2008

Quoting - Robert Reed (Intel)

Well, I think you've already performed the first step in determining mrqmin's thread safety: if when you cut it out it runs smoother, the piece you cut out probably isn't thread safe :-)

Writing thread-safe code is a skill and an artform that has volumes written about it, and that we are trying to evangelize about. Thus telling you all you need to know about thread safety is probably too much for this one post. I'll try to give you some hints and suggest that you become familiar with this forum, which contains many stories of people dealing with thread safety issues.

I assume these variables in mrqmin were declared static for a reason, which you will need to understand in order to make a thread-safe version. Removing the static specifier will guarantee each invocation of the function will have their own private copy, but it also means that each invocation will create a new copy, which will be discarded when the function returns. Another thing that the static specifier provides is persistence. Is there a context that needs to be maintained from call to call? Removing the static may break the solver.

What you're shooting for is to preserve the algorithm performed by the function while making it reentrant. (The Wikipeida article just linked contains some requirements for reentrancy.)

I'm not sure of your reference in the statement, "sometimes, the error is 'first chance exception...',..." Are there other Numerical Recipes functions your code calls that might also have thread safety issues?

The problem is solved!

Your suggesttion is really helpful. As you said removing the static specifier makes the parallel part run, but the result is not correct.I found that the funcion NR::mrqmin really need to keep two vairables alive from call to call. So I definethese two vairablesoutside the functionNR::mrqminas local varibales, then it works. I got the right results.

Thank you very much!