Intel® C++ Compiler
Community support and assistance for creating C++ code that runs on platforms based on Intel® processors.
7956 Discussions

Will it behave differently in omp parallel for with single thread?

rudaho
Beginner
424 Views
Hi there~

I tried to adopt OpenMP to parallelize the matrix loading process. There are many elements that contribute to the matrix. I used omp parallel for to load the elements' contribution to the Jacobian in parallel. However, I got a very strange situation. If the thread number > 1, the matrix seems fine. But if I tried to use single thread by setting the environment variable to 1 or set the if condition to false, in some cases, the matrix got wrong values somewhere.

The pseudo code is as below:
[cpp]Element** elmAry = (Element**)calloc(size, sizeof(Element*));
//elmAry is setup somewhere.

int idx;

#pragma omp parallel for private(...) firstprivate(...) if (...)
for (idx = 0; idx < size; ++idx) {
    Element* elm = elmAry[idx];
    ...
    load elm's contribution to matrix
    ...
}

...[/cpp]
I just add the parallel for symtax to parallelize the original program. However, when the thread number is 1, the matrix got wrong values somewhere. I tried many cases, only few with large matrix size has this problem, it's hard to figure out what's going wrong by gdb. And of course, if the parallel for is removed, the matrix is correct. This problem occurs while using icc10.1.011. I tried icc11.1, this problem dismissed.

Is there anyone can help to tell what happens or what affects the program results in single thread mode with parallel mechanism. Thanks a lot...

Best Regards

YJ
0 Kudos
3 Replies
Om_S_Intel
Employee
424 Views
The application should produce the same result within tolerance limit using signgle thread and parallel version.

If you can help us with a testcase then we can review the issue.
0 Kudos
rudaho
Beginner
424 Views
Hi Sachan~

Thanks for the reply. I have discussed with my colleuges about this and they have this problems before, and... The conclusion we came up with is => it should be the optimization issue. Since our product is sensitive to the matrix values, some slightly modified code may result in different optimization binary. Thus some slightly different values will sum up to totally different results. (In my case, the modified code does not even executed, so compiled binary makes difference)

I tried to move the code up and down, write in other words (try to come up with different binary with the same meaning). Any way, the modified binary works fine for my case now. I know it's stupid to overcome the problem like my doing, but I got no other ideas currently. So, any comments will be welcome. Thanks...

Best Regards

YJ

ps: by the way, I can't provide the code and test case I suffered this problem, since they are owned by the company, and this problem occurs only for large cases, I didn't suffer this problem in small ones. So I didn't come up with small testcase right now.
0 Kudos
jimdempseyatthecove
Honored Contributor III
424 Views
See if your code has an issue relating to uninitialized variable usage.
(or lack of copyin, etc...)
You may have a bug in your serial (1 thread parallel) code that was hidden until now.

Jim Dempsey
0 Kudos
Reply