Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.
2481 Discussions

How to initialize local variable for each thread (parallel_for)? - Seems like a bug of parallel_for (TBB 4.0)

zhaonaiy
Beginner
2,970 Views
Greetings,
I am using TBB 4.0 on Windows (VS 2010), and trying to to parallel a loop using parallel_for. However, I encountered a local variable initialization problem.
class getIteratorArrayBody
{
//some other private variable declarition
int *rOffsetArray;
public:
getIteratorArrayBody(..., int * r):rOffsetArray{}
void operator()( const tbb::blocked_range& range ) const
{
size_t iterator; // thread local variable
for (size_t j=range.begin(); j!=range.end(); j++) {// code to do some calculation
iterator= ....
rOffsetArray = iterator;
}
...
}
............
int * IndexingEngine:: getIteratorArrayInParallel (..., tbb::affinity_partitioner &affinity)
{
int *rOffsetArray = new unsigned int [100000];
tbb::parallel_for( tbb::blocked_range( 0, vLBASpaceLength ), // Index space for loop getIteratorArrayBody (...,rOffsetArray), // Body of loop
affinity ); // Affinity hint
returnrOffsetArray;
}
For the above code, if I specify number of threads to 1, then no problem to execute.
However if I specify number of threads to 64, then it reports "The variable 'iterator' is being used without being initialized"
From this, it seems this problem is caused by conflict between threads. I did try some other methods to initialize it, however all failed.
Any suggestion to resovle this initizalition problem is very appreciated!!
Nai Yan.
0 Kudos
21 Replies
RafSchietekat
Valued Contributor III
2,692 Views
From what you have chosen to show us, that would seem to be a bug, because the use follows an assignment, and it would be bad if the compiler were to require bogus/dead initialisations, since sometimes what you actually want is to have several possible code paths that might assign a variable and be able to rely on the compiler to warn you if not all bases were covered.

However, you should try to avoid separating declaration and initialisation where practical, and to also use "const" if it's not really a "variable" (for historical reasons the language has that backwards). At the start of the block, it would even work in C, so you basically have no excuse. If you do that, the compiler might not get confused.
0 Kudos
zhaonaiy
Beginner
2,692 Views
Thank you for your reply. However, why I use single thread, it can work well? If threads issued by "parallel_for" will impact each other, if they share same data area to update (i.e. array "rOffsetArray") and use the same local variable in const operator() function?
Any comments from you is very appreciated!!
Nai Yan.
0 Kudos
RafSchietekat
Valued Contributor III
2,692 Views
Can you make it one const definition just to see what happens?

But it's a strange situation anyhow, so maybe you have left out some essential detail that would allow a more useful diagnosis. What do you mean with "use single thread" (could be a compiler setting or a task_scheduler_init value)? Is it more than a compiler error (execution might fail or still be as expected)? Can you show more of the code?
0 Kudos
zhaonaiy
Beginner
2,692 Views
HelloRaf,
Below is the code. For signle thread, just modify thetask_scheduler_init value (i.e. set variable p to 1 below) in main func. There's no error during compiling and execution. Execution is successful and the result looks as expected.
In Main func:
IndexingEngine * idxEng = new IndexingEngine();
...
int p = 32;
tbb::task_scheduler_init init(p);
rOffset* r =
idxEng->scanIndex(vLBAArray,idxHry,pLBAPool_index_area_loc,vLBASpaceLength,DEFAULT_PAGE_SIZE);
....
InIndexingEngine:
rOffset* IndexingEngine::scanIndex(unsigned int* vLBAArray,
indexHierarchy * ih,
char * pLBAPool_index_data_loc,
int vLBASpaceLength,
int page_size)
{
int numOfVerticalLayers = ih->page_hierachy->vertical_layer;
int * pointerOfHorizontalSpan = ih->page_hierachy->horizontal_span;
static tbb::affinity_partitioner affinity;
rOffset* r = getIteratorArrayInParallel (vLBAArray, pNodeIndexArray, vLBASpaceLength, ih, pLBAPool_index_data_loc, page_size,affinity);
return r;
}
class getIteratorArrayBody
{
unsigned int* vLBAArray;
int* pageNodeIndexArray;
indexHierarchy * ih;
int page_size;
rOffset* rOffsetArray;
IndexingEngine * idx_e;
FILE * fp;
public:
getIteratorArrayBody(unsigned int* vLBAAr, int * p_NodeIndexArray, indexHierarchy * ihy, int m_pageSize, rOffset* m_rOffsetArray, IndexingEngine *_idx, FILE * fp1):
vLBAArray(vLBAAr),pageNodeIndexArray(p_NodeIndexArray),ih(ihy),page_size(m_pageSize),rOffsetArray(m_rOffsetArray),idx_e(_idx), fp(fp1){}
void operator()( const tbb::blocked_range& range ) const
{
size_t iterator;
for (size_t j=range.begin(); j!=range.end(); j++)
{
unsigned int pageNodeIndex = pageNodeIndexArray;
unsigned int starting_vLBA = ih ->node_hierarchy[0][pageNodeIndex].min_vLBA;
unsigned int ending_vLBA = ih->node_hierarchy[0][pageNodeIndex].max_vLBA;
unsigned int middle_ref = (int)((starting_vLBA + ending_vLBA)/2);
unsigned int middle_idx = (int) (page_size/2);
unsigned int vLBA = vLBAArray;
unsigned int left_boundary;
unsigned int right_boundary;
if (vLBA == starting_vLBA)
iterator = pageNodeIndex * page_size;
else if (vLBA == ending_vLBA)
iterator = (pageNodeIndex + 1) * page_size;
else
{
unsigned char *c = new unsigned char [page_size];
fseek(fp, pageNodeIndex * page_size, SEEK_SET);
fread(c,page_size,1,fp);
if (vLBA < middle_ref && vLBA > starting_vLBA) // lower value half area
{
int temp_itern = pageNodeIndex * page_size;
left_boundary = starting_vLBA;
for (int i= 0; i
{
right_boundary = left_boundary + idx_e->getLengthOfvLBAExtent(c);
if (vLBA >= left_boundary && vLBA <= right_boundary)
iterator = temp_itern+i;
else
left_boundary = right_boundary;
}
}
else if ((vLBA >= middle_ref) && vLBA < ending_vLBA) // higher value half area
{
int temp_itern = (pageNodeIndex +1) * page_size;
right_boundary = ending_vLBA;
unsigned char d;
for (int i= 0; i<(page_size-middle_idx); i++)
{
left_boundary = right_boundary - idx_e->getLengthOfvLBAExtent(c[page_size-i-1]);
if (vLBA >= left_boundary && vLBA <= right_boundary)
iterator = temp_itern-i-1;
else
right_boundary = left_boundary -1;
}
}
}
rOffsetArray = iterator;
}
}
};
rOffset * IndexingEngine::getIteratorArrayInParallel (unsigned int* vLBAArray,
int* pageNodeIndexArray,
int vLBASpaceLength,
indexHierarchy * ih,
char * pLBAPool_index_data_loc,
int page_size, tbb::affinity_partitioner &affinity)
{
cout <<"Starting getting iterator array ... \n";
rOffset* rOffsetArray = new unsigned int[vLBASpaceLength];
FILE * fp = fopen (pLBAPool_index_data_loc, "rb");
tbb::parallel_for( tbb::blocked_range( 0, vLBASpaceLength ), // Index space for loop
getIteratorArrayBody (vLBAArray, pageNodeIndexArray, ih,page_size,rOffsetArray,this,fp), // Body of loop
affinity ); // Affinity hint
cout <<"rOffset array is obtained ... \n";
return rOffsetArray;
}
0 Kudos
RafSchietekat
Valued Contributor III
2,692 Views
That's too much for a simple data flow analyser to handle, if it is at all computable. You will need to initialise the variable to get rid of the warning. If you're like me, you would still want to be assured that it was assigned to before it gets assigned from, by initialising with a special value that can only mean "nil", or by pairing it with a bool that carries the same information, and asserting that it is no longer "nil" just before using it. (Note that this is purely serial, it has nothing to do with threading or TBB.)

Just to be sure of what you meant (because some statements seemed contradictory): I presume that you saw a compiler warning (not an error) regardless of the number of threads specified in task_scheduler_init, correct?
0 Kudos
SergeyKostrov
Valued Contributor II
2,692 Views
Quoting zhaonaiy
...
However if I specify number of threads to 64, then it reports "The variable 'iterator' is being used without being initialized"
...
Nai Yan.


TBB DLLs could be built as 'Non-MT' and 'MT' ( MT - Mulithreaded ) and they have different Runtime checks
specified in VS project settings ( I assume that you're using VS ).

Also, did you try a Release build? Did it work?

0 Kudos
SergeyKostrov
Valued Contributor II
2,692 Views
...
void operator()( const tbb::blocked_range& range ) const
{
size_t iterator;
...

Could you initialize 'iterator' variableto 0? Like:

...
size_t iterator=0;
...
0 Kudos
zhaonaiy
Beginner
2,692 Views
Greetings,
Thank you all for your reply. Really appreciate it!!
This is the result,
1) Under single thread and multi thread mode (via modify "p" value), there's no compiler err; But there's execution err when switch to multi-thread mode (p = 32), for single thread, it works well.
2) To setsize_titerator=0; in "operator()" body, can works. However, my question, if programming under TBB mode has any data security problem - if each thread can work as a single thead?
Since single thread mode can work, why multi-thead not work? This is really my question? If there's anything I missed or bug in code?
Looking forward to your further comments!!
Thanks again in advance.
Nai Yan.
0 Kudos
RafSchietekat
Valued Contributor III
2,692 Views
Without the initialisation, iterator will contain whatever was there in memory, and perhaps you are just lucky (it can happen!) when executing on a single thread? Please use my suggestion to find out whether you are depending on this initial value, and if that's how it should be, be sure to initialise it to 0 (supposedly). You can also assert that iterator is zero right after the declaration (without initialisation), to verify what you've observed before: if the assert succeeds in serial execution (by sheer luck) and occasionally fails in concurrent execution, that explains what you saw.
0 Kudos
RafSchietekat
Valued Contributor III
2,692 Views
(Removed double.)
0 Kudos
SergeyKostrov
Valued Contributor II
2,692 Views
Quoting zhaonaiy
...
Since single thread mode can work, why multi-thead not work?

[SergeyK] Sorry,I didn't have a chance to execute your code. I simply would like to stress that
every thread could use its own set of localvariables,or automatic, allocated on the
stack. If some set ofvariables is shared between threads a great deal of attention is
needed and some synchronization objects should be used.

This is really my question? If there's anything I missed or bug in code?

[SergeyK] It is a very good practice to initialize all local variables to some right values. A C/C++
compiler could warn you, but a warning message could bedisabled. If you don't initialize
your local variables results of your processing could beunpredictable.
...
Nai Yan.

Best regards,
Sergey
0 Kudos
RafSchietekat
Valued Contributor III
2,692 Views
"It is a very good practice to initialize all local variables to some right values."
That is a matter of opinion.
0 Kudos
zhaonaiy
Beginner
2,692 Views
Now after verification, it seems like a bug of parallel_for or anything I misued parallel_for.
I did scan the returned int array.
In single thread mode, the scan return result as below,
Validate if rOffset array is valid or not
Total valid 16 bit numbers are 368
Total valid 32 bit numbers are 999470
Invalid number is 162
In multi thread mode (p = 32), the scan return result as below,
Validate if rOffset array is valid or not
Total valid 16 bit numbers are 368
Total valid 32 bit numbers are 999632
Invalid number is 0
Obviously, program under multi-thead mode returns diff result from single thread mode.
Here is the code for scanning int array,
int invalid = 0;
int valid_32bit = 0;
int valid_16bit =0;
for (int i=0; i< vLBASpaceLength; i++)
{
if (r<=0)
invalid ++;
if (r >0 && r <= 65535)
valid_16bit ++;
if (r >65535 && r <= 4294967295)
valid_32bit ++;
}
cout << "Total valid 16 bit numbers are "<< valid_16bit << "\n"<
cout << "Total valid 32 bit numbers are "<< valid_32bit << "\n"<
cout << "Invalid number is "<<< "\n" <
cout << "data valiidatation check is complete..."<
If my logic is not wrong, then it seems like a bug of parallel_for.
Any response is very appreciated!!
Nai Yan.
0 Kudos
RafSchietekat
Valued Contributor III
2,692 Views
"If my logic is not wrong, then it seems like a bug of parallel_for."
That is such a bold statement (parallel_for is a basic algorithm, there's a test suite to verify its correct operation, and many people are using it) that I dare recommend to first look at your own program some more as the more likely cause.
0 Kudos
zhaonaiy
Beginner
2,692 Views
Hello Ref,
Thank you for your reply. Maybe I am wrong, this is why I am using "seems like", because my assumption is that, no matter how many you change number of threads to use (i.e. to modify the value of "p"), the exec. result should be same. However, it's not what reflected after my program execution.
Again, if it's a problem of my program, the exec. result should be same for 1 thread and 32 threads - both successful, or both failed. Now, single thread succeeded, multi-thread failed.
I believe TBB is mature and many people are using it. But I was wondering why I have different exec. results in my program.
Any finding if you see in my program or anything I need to customize my program to suit it for multi threads, please let me know.
Thanks in advance!!
Nai Yan.
In Main func:
IndexingEngine * idxEng = new IndexingEngine();
...
int p = 32;
tbb::task_scheduler_init init(p);
rOffset* r =
idxEng->scanIndex(vLBAArray,idxHry,pLBAPool_index_area_loc,vLBASpaceLength,DEFAULT_PAGE_SIZE);
....
int p = 32;tbb::task_scheduler_init init(p);rOffset* r =idxEng->scanIndex(vLBAArray,idxHry,pLBAPool_index_area_loc,vLBASpaceLength,DEFAULT_PAGE_SIZE); ....
0 Kudos
RafSchietekat
Valued Contributor III
2,692 Views
The first thing I saw was unsynchronised access to a FILE (between fseek and fread, another thread's fseek might intervene), which is pretty serious all by itself (I didn't look further at this time).
0 Kudos
zhaonaiy
Beginner
2,692 Views
Thank you for your investigation. I was trying to use multi-theads to read a file from different positions and fetch content from it. So any suggestion to solve this?

Thanks!!
Nai Yan.
0 Kudos
RafSchietekat
Valued Contributor III
2,692 Views
At the very minimum use a mutex lock around the fseek/fread.

But you also have to consider why you are using parallelism, i.e., to increase performance, and it might not be such a good thing to read little pieces of a FILE here and there (maybe a mapped file might work fine, I don't know). Perhaps you should instead go with parallel_do/parallel_while and several kilobytes of data per chunk. Just don't do several I/O operations per chunk.
0 Kudos
zhaonaiy
Beginner
2,692 Views
Actually, I was trying to increase I/O queue depth (my target disk is a SSD) by concurrent multiple threads, something like queue depth in I/O meter. However, the diffence for my case, is multi-theads operate one file concurrently.
Do you know if it's possible to operate a file by multiple threads concurrently?
Any response is very appreciated!!
Thanks.
Nai Yan.
0 Kudos
RafSchietekat
Valued Contributor III
2,564 Views
I'm interested whether others' experience would confirm or disprove what I wrote above, but have nothing new to add at this time.
0 Kudos
Reply