Re: Dynamic memory deallocation in OpenMP

mfalda · ‎04-08-2009

I am learning OpenMP and I have a doubt about dynamic memory. If I have such a fragment in C:

[cpp]int i, j;
double *vect = NULL;

//... determine N
vect = (double *) malloc(N * sizeof(double));
for (i = 1; i < N; i++) {
for (j = 1; j < N; j++)
vect = rand();
// use vect...
}
free(vect);
vect = NULL;
[/cpp]

I think that I should have a private copy of the vector, since it is written by each thread, so I would write:

[cpp]int i, j;
double *vect = NULL;

//... determine N
#pragma omp default(none) private(i, j, vect)
for (i = 1; i < N; i++) {
vect = (double *) malloc(N * sizeof(double));
for (j = 1; j < N; j++)
vect = rand();
// use vect...
free(vect);
vect = NULL;
}
[/cpp]

But this is inefficient, since it would be enough to allocate/deallocate just at the begin/end of the loop performed by each thread. Is there a way to obtain this behaviour relying on pure C without using constructors and destructors? Is the possible solution standard or based on a Intel proprietary feature?
Thank you for any answer.

jimdempseyatthecove · ‎04-08-2009

Typically you have a shared copy of the vector and each thread works on a portion of the shared copy.

You may also want private copy of the vector.

Your before code sample code seems to indicate shared copy with use of #pragma omp for to partition the iteration space. It is not quie clear what you want to do with both loops (i, j)

Your parallelization may need to be applied to outer, inner, both or outer+inner loopsin same partition.

When constructed to work on parts of the vector then only one allocation needs to be perfromed.

Also rand() has severe interaction problems (uses critical section)so do not use that inside your timed loops.

Jim Dempsey

mfalda · ‎04-08-2009

Thanks you for the answer. I try to explain me better: the aim is to perform in parallel a number of experiments (so, for example, vect could be a vector of parameters read from files "par_i.txt", I will check whether GNU R random routines are thread-safe), therefore the vector has to be private and the parallelization can be applied just to the outer loop (in the first post I missed "for" in the directive!):

[cpp]int i, j;  
double *vect = NULL;  
  
//... determine the number N of independent experiments
#pragma omp for default(none) private(i, j, vect)  
for (i = 1; i <= N; i++) {  
vect = (double *) malloc(N * sizeof(double));  
// initialize vect reading it from file "par_i.txt"
// use vect...  
free(vect);  
vect = NULL;  
}  

[/cpp]

Suppose that I have N = 4 and 2 threads: I think that the loop could be "unfolded" in

[cpp]//thread 1
vect = (double *) malloc(4 * sizeof(double));  
// initialize vect reading it from file "par_1.txt"
// use vect...  
free(vect);  
vect = NULL; 

vect = (double *) malloc(4 * sizeof(double));  
// initialize vect reading it from file "par_2.txt"
// use vect...  
free(vect);  
vect = NULL; 

//thread 2
vect = (double *) malloc(4 * sizeof(double));  
// initialize vect reading it from file "par_3.txt"
// use vect...  
free(vect);  
vect = NULL; 

vect = (double *) malloc(4 * sizeof(double));  
// initialize vect reading it from file "par_4.txt"
// use vect...  
free(vect);  
vect = NULL; 


Instead I would obtain:

//thread 1
vect = (double *) malloc(4 * sizeof(double));  
// initialize vect reading it from file "par_1.txt"
// use vect...  

// initialize vect reading it from file "par_2.txt"
// use vect...  
free(vect);  
vect = NULL; 

//thread 2
vect = (double *) malloc(4 * sizeof(double));  
// initialize vect reading it from file "par_3.txt"
// use vect...  

// initialize vect reading it from file "par_4.txt"
// use vect...  
free(vect);  
vect = NULL; [/cpp]

Is it possible to obtain such an execution flow? Perhaps by separating parallel for like:

[cpp]int i, j;  
double *vect = NULL;  
  
//... determine the number N of independent experiments
{
#pragma omp parallel 
vect = (double *) malloc(N * sizeof(double));  
#pragma omp for default(none) private(i, j, vect)  
for (i = 1; i <= N; i++) {  
// initialize vect reading it from file "par_i.txt"
// use vect...  
}  
free(vect);  
vect = NULL; 
}
[/cpp]

Thank you.

jimdempseyatthecove · ‎04-09-2009

The random number generator is thread safe. However, it uses a critical section. Therefore, it will interfere with performance measurements (unless your actual code will use the random number generator). My guess is that even though you are not use your application for a performance test, you will setup a mock test environment to guage the speed-up of parallizing your code. Using the random number generator _inside_ you mock-up code will interfere with your performance measurements.

To work around this, determine the number of random numbers you will need. Generate a table of these random numbers, then run your performance insturmented code and where you have a call to the random number generator your insert code to extract the next number for your thread. Each thread uses exclusive sub section of set of preallocated random numbers OR each iteration uses exclusive element from set of preallocated random numbers.

Potential code sketch:

[cpp]	int i, j;     
	double *vect = NULL;     
	//... determine the number N of independent experiments
	// vec[experiment] will contain pointer to working data
	// for experiment. There is no experiment number 0
	vect = (double **) malloc((N+1) * sizeof(double*));
	_ASSERT(vect);
	for (i = 0; i <= N; i++)
		vec = NULL;
	//
	{   
	#pragma omp parallel for default(none) private(i, j) shared(vect) schedule(static,1)     
	for (i = 1; i <= N; i++)
	{    
		// initialize vect reading it from file "par_i.txt" 
		// ... get the extent from "par_i.txt"
		vect = (double *) malloc(extent * sizeof(double));
		// ... read/populate vect
		// ... process vect
		// (end loop keeping results)
	}
	// results of experiments available here
	// vect[0] not used
	// vect[1] has * to results of experiment 1
	// ...
	// vect has * to results of experiment N
	// now do something else with results
	// (e.g. summarize, or collate, ...)
	// ...
	//
	for (i = 0; i <= N; i++)
	{
		if(vect) free(vec);
	}
	free(vect);     
	vect = NULL;    
	}  
	return 0;
[/cpp]

Jim Demspey

mfalda · ‎04-09-2009

Ok, now I understand how to proceed. Thank you very much!

jimdempseyatthecove · ‎04-09-2009

That is just one suggestion.

A different approach would be a tasking approach. Assume your N files were fairly large and would benefit from sequential reading. Better performance might be attained by having one task run through the N files populating working sets. As populations complete, additional threads perform the process function, then similarly the results are sequentially written. Together this often called a pipeline.

Alexey-Kukanov · ‎04-13-2009

Quoting - mfalda

Is it possible to obtain such an execution flow? Perhaps by separating parallel for like:

[cpp]int i, j;  
double *vect = NULL;  
  
//... determine the number N of independent experiments
{
#pragma omp parallel 
vect = (double *) malloc(N * sizeof(double));  
#pragma omp for default(none) private(i, j, vect)  
for (i = 1; i <= N; i++) {  
// initialize vect reading it from file "par_i.txt"
// use vect...  
}  
free(vect);  
vect = NULL; 
}
[/cpp]

Thank you.

This is almost exactly the way to achieve what you want, except for the opening curly brace to be moved right after the openmp parallel directive, and thevect either being declared as private, or moved inside the parallel region:

[cpp]int i, j;  
double *vect = NULL;  
  
//... determine the number N of independent experiments
#pragma omp parallel private(vect)
{
// each thread in the parallel region will allocate its own copy of vect.
vect = (double *) malloc(N * sizeof(double));  
#pragma omp for default(none) private(i, j)  
for (i = 1; i <= N; i++) {  
// initialize vect reading it from file "par_i.txt"
// use vect...  
}
// parallel loop ended, but the parallel region continues.
// each thread will free its copy of vect.
free(vect);  
vect = NULL; 
} // parallel region ends here
[/cpp]