I need to specify the kind of cluster that suits my requirement. I do molecular dynamics simulations. The software would generate threads that require*large amount* of communication between themselves. Each threads requires to send as well as recieve information from every other thread.
In such a scenario, if I stick to a 8-node processor cluster (as the s/w runs only on 2^n processors), which of the following would be more suitable:
1. A 2-processor dual core xeon processors with HT capability.
2. Four HT-enabled P4 processors.
3. Two P4 Extreme processor machines.
4. Four P4 Dual core (HT-less/disabled) processors.
5. Eight P4 processors.
We can choose an appropriate switch/hub for any of them.
I understand that only the actual benchmarks will make the things clear, but I would like to know in general what is expected to be the best.
You pose an interesting question, with far too little data to answer all the questions. I assume you don't foresee an advantage to a hybrid model, such as OpenMP within a node. (4) is one of the more attractive possibilities. Probable performance ranking of those possibilities is 1,3,2,4,5. (1) might rise in the ranking if you do have an effective hybrid MPI/OpenMP implementation. Otherwise, 4 and 5 should be competitive with the others even if 25% of the cluster is shut down. In most circumstances, HT offers performance boost in the 5-10% range on single P4 processors. For any benefit on your types 2 and 3, you must use a current linux distro with specific support for HT. With linux 2.4 kernels, or Windows, HT generally cuts cluster performance by 10% or so. Doubling the number of communicating processes and maybe nearly doubling the required RAM, without doubling the total computational power, is not usually a good idea. Evidently, dual package dual core models are becoming the hot spot in the cluster market. Unless shared memory communication is particularly effective, they should not have any advantage over single package dual core nodes. Note that Intel refers to a single dual core package, with or without HT, as a CPU, requiring translation between Intel-speak and industry practice. With a total of only 8 cores, you should get satisfactory performance from a gigabit switch, with attention to lan driver parameters and economy of communication.
Without more detailed info we can only speculate on the best cluster configuration for you. You listed several distributed-memory configurations soI'll assume you meanmessage-passing processes rather than threads (which usually implies shared-memory).
If the amount of computation per dynamicstime step is significantly greater than the cost of the all-to-all communication, the interconnect is less important. Get the fastestprocessors possible and whatever switch (not hub) you can afford with the rest of your budget. If parallel performance is communications-bound, the interconnectcould wellbe more important than processor performance.
I won't speculate on the optimum number of processors per compute node but multi-processor compute nodes are advisable over single-processor systems. You get more computing power for a given footprint and fewer (potentially expensive)network cards are needed. In general, HT is disabled in HPC clusters because it usually doesn't help message-passing applications.