This question was asked during theAn Introduction to High Performance Computing: Parallel Computing Issues webcast. Here is the answer given by Tom Lehman's answer.
There are several that are available for free. The two that we play with the most within my group, one of them is called OSCAR and it's available from SourceForge. Another common clusterpackage from San Diego Supercomputer Center is calledRocks. Both will allow you to build a cluster relatively easily. It takes care of sorting out all of the communications paths between the members of your cluster, and basically I can build a Rocks cluster of, say, 256 nodes in about four hours. Of course, if you don't happen to have 256 nodes, maybe you're only doing four nodes, it'll take you about 45 minutes, max. But along with those packages are included usually management packages such as Ganglia or another package from NCSA called CluMon. These give you an overall picture of the health of the software on your cluster. They show you what load on any given processor is. You can see historical data as to where your load was and where it recommends that it's probably going to be going, et cetera. Also being built into these clustering monitors are some monitors for the hardware as well, so that you can determine that you've got nodes that perhaps have fan failures and maybe should be taken out of operation as soon as possible, or nodes that have flat-out failed because maybe they lost the power supply. So that's one form of cluster management. Another form of cluster management is the workload management. In most clusters the way that they're operated is in a batch processing system, where you submit your job as they did in days gone by to the master node, and then a queuing system puts you into the proper queue, and then will execute your job and send the results back to an appropriate place once the necessary processors are available. Those packages are also part of OSCAR and Rocks. Again, they're automatically installed and you just start using them after you've put your cluster together.
Message Edited by hagabb on 11-01-2004 11:08 AM
Message Edited by hagabb on 11-01-2004 11:18 AM
And if you are using Rocks. Most of the Intel development tools: C/C++, Fortran, Intel MPI Library, Intel ClusterMath Kernel Libraryare available as a Rocks "Roll" - Rolls are a Rocks pre-packaged software distribution mechanism.
The Rocks Roll for Intel allows you to build your Rocks cluster with all the necessary Intel tools installed easily. (Basically just insert the Rocks Roll for Intel when prompted).
You will however need to getproduct licenses from Intel.
Message Edited by hagabb on 01-04-2005 06:42 AM
Message Edited by hagabb on 01-14-2005 07:23 AM
yes - that is possible with SGE.. as long as you select to run on the same binary platform and link in all the resources.
SGE allows you to build out campus/enterprise wide grids... not a problem.
This is a file staging issue.
If your /home is globally accessible - then it is simple as any machine SGE allocates to you - your application can get hold of the input files and write to the output directorty.
However if /home is not globally accessible - then it becomes more cumbersome. You will need to write your SGE script file such that you can copy (you start from a well know location ie server), but need to take into account that SGE will allocate to you a node which you do not know before hand.
SGE does have ENV vars which you can get the hostname but you will have to assume you know the layout of the directory structure...
You are basically grappling with the Globus IO issue here if your clusters are not connected to a single /home over LAN/WAN.