[uname -a : Linux dhcp- 2.6.18-194.11.1.el5 #1 SMP Tue Jul 27 05:45:06 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux]
I am trying to use OpenMP for a particular application and I am somewhat confused by the behavior. I am using Intel compiler 11.1 with -O3 - openmp options and processor is 2 X Intel Xeon CPU E5620 @ 2.40GHz with RHEL 5.5. I have included the parallel section below. If I use it as shown I don't get any speed up at all no matter how many cores I use. I was naively hoping that I will get decent speed up with this scheme. I have tried another scheme which roughly corresponds to parallelizing at the moleculeloop (line number 9) and I get decent speed up (100% on 4 cores) . Am I missing something? [I hope it is not related to cache misses]. At this point I would like to get some feedback from openmp experts, if I should just give up the first scheme and adopt the second scheme and move on. Thanks.