The attached program (which is actually C) produces incorrect output if and only if -parallel is specified, irrespective of the number of threads and what seems to be consistently. However, there is a difference between 126.96.36.199 and the beta, though both are incorrect. It works in its Fortran version (not included) and under gcc. To test it, run the command 'show N', where N is the number of iterations (I recommend 1). That shows that it is corrupting the data indexed by the lower bound(s).
I've reproduced the problem you reported, and entered it to our problem-tracking database. I'll let you know when I have an update regarding it. Thank you for your test case.
Intel Developer Support
Tools Knowledge Base: http://software.intel.com/en-us/articles/tools