Showing results for 
Search instead for 
Did you mean: 

Will it behave differently in omp parallel for with single thread?

Hi there~

I tried to adopt OpenMP to parallelize the matrix loading process. There are many elements that contribute to the matrix. I used omp parallel for to load the elements' contribution to the Jacobian in parallel. However, I got a very strange situation. If the thread number > 1, the matrix seems fine. But if I tried to use single thread by setting the environment variable to 1 or set the if condition to false, in some cases, the matrix got wrong values somewhere.

The pseudo code is as below:
[cpp]Element** elmAry = (Element**)calloc(size, sizeof(Element*));
//elmAry is setup somewhere.

int idx;

#pragma omp parallel for private(...) firstprivate(...) if (...)
for (idx = 0; idx < size; ++idx) {
    Element* elm = elmAry[idx];
    load elm's contribution to matrix

I just add the parallel for symtax to parallelize the original program. However, when the thread number is 1, the matrix got wrong values somewhere. I tried many cases, only few with large matrix size has this problem, it's hard to figure out what's going wrong by gdb. And of course, if the parallel for is removed, the matrix is correct. This problem occurs while using icc10.1.011. I tried icc11.1, this problem dismissed.

Is there anyone can help to tell what happens or what affects the program results in single thread mode with parallel mechanism. Thanks a lot...

Best Regards

0 Kudos
3 Replies

The application should produce the same result within tolerance limit using signgle thread and parallel version.

If you can help us with a testcase then we can review the issue.

Hi Sachan~

Thanks for the reply. I have discussed with my colleuges about this and they have this problems before, and... The conclusion we came up with is => it should be the optimization issue. Since our product is sensitive to the matrix values, some slightly modified code may result in different optimization binary. Thus some slightly different values will sum up to totally different results. (In my case, the modified code does not even executed, so compiled binary makes difference)

I tried to move the code up and down, write in other words (try to come up with different binary with the same meaning). Any way, the modified binary works fine for my case now. I know it's stupid to overcome the problem like my doing, but I got no other ideas currently. So, any comments will be welcome. Thanks...

Best Regards


ps: by the way, I can't provide the code and test case I suffered this problem, since they are owned by the company, and this problem occurs only for large cases, I didn't suffer this problem in small ones. So I didn't come up with small testcase right now.
Black Belt

See if your code has an issue relating to uninitialized variable usage.
(or lack of copyin, etc...)
You may have a bug in your serial (1 thread parallel) code that was hidden until now.

Jim Dempsey