I'm a newbie of Intel Cluster OpenMP and woking on my fortran code as seen below using my small Core2Duo clusters with GbE network (3 nodes, each has single Core2Duo).
In my testings, "serial" run and "-openmp" run show reasonable speedup of CPU time, but failed in the case using "-cluster-openmp" run.
CPU times were as below: "sereal" run 7.6 (s "-openmp" run 4.4 (s) "-cluster-openmp" run 113.4 (s) !!!!! (other options are "-O3 -ip -ipo -ftz")
Hi, Probably, it would be much better to ask this question in Intel Parallel Architectures forum, but I'll try to give some hints. OpenMP supposed to work on one machine running several threads. Cluster-OpenMP should work on clusters but it doesn't mean that you'll get perfromance improvement. The main problem for cluster-openMP is memory latency. Below you can see a table with figures of latency for different memory types. Latency to L1: 1-2 cycles Latency to L2: 5 - 7 cycles Latency to L3: 12 - 21 cycles Latency to memory: 180 225 cycles Gigabit Ethernet latency to remote node: ~28000 cycles
I've taken these figures for Itanium processor but it's not so important. You can see that if an application is running on one node processor's cache can be used and you get very low latency. But if you run your application on a distributed system data can be located on different nodes and latency will be very high. Unfortunately I don't know how to tune your application.
Thanks a lot for your suggestion. I tested my cluster's latency by using clomp_getlatency.pl provided by Intel and the latency of my network is about 45 micro seconds. I know that GbE network has larger latency than cpu cache and also other interconnects like Myrinet and Infiniband. I hope to use them near the future ..... I will ask some help in Intel Parallel Architecture forum and close this thread.