Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Vishnu
Novice
170 Views

Add machine to Cluster

I run a student license of the Intel® Parallel Studio XE Cluster Edition for Linux on my i7-4790K. Can I add another i7-4790K machine to make this a cluster? They are connected over gigabit ethernet, and my usage scenario never involves message passing between machines. I will only use OpenMP or MPI within a machine.

If possible, can someone give me pointers to how to do it? I currently have the compiler installed on my machine only. Thanks!

0 Kudos
14 Replies
Michael_S
Employee
170 Views

Hello,

Since the license you are using is node locked, you won’t be able run the tools on both machines – even if the filesystem is shared. Usually, cluster installations have a head- node or compilation nodes where the tools would be available to the users. In addition to that, such installations would usually leverage floating licenses for multiple users which are only node-locked on the licensing server side.

In your case, you may want to decide which node you want to use for the tools installation (node locked) while the other node will have limited access to the tools in order to use the runtimes.

Therefore you will be limited by for example compiling only on one node, while you are still able to run your MPI applications across both nodes – since the runtimes don't require a license.

Best regards,

Michael

Vishnu
Novice
170 Views

Michael Steyer (Intel) wrote:

you will be limited by for example compiling only on one node, while you are still able to run your MPI applications across both nodes – since the runtimes don't require a license.
 

I suppose that is all I need. So to do that, do I have to do something while installing the compilers? Or can I just add machines to my 'cluster' by ading their ip/hostnames to a cluster file?

Michael_S
Employee
170 Views

Hello,

Regarding the license, it will be locked to the node where you install it on.

If you have a shred filesystem in between the two nodes, there is nothing specific you need to do in addition. However, in case you don't have a shared filesystem, you may either allow the installer to access both nodes (ssh keys) or you download and install the runtimes separately on the second node.

Best regards,

Michael

Vishnu
Novice
170 Views

Michael Steyer (Intel) wrote:

If you have a shred filesystem in between the two nodes, there is nothing specific you need to do in addition. However, in case you don't have a shared filesystem, you may either allow the installer to access both nodes (ssh keys) or you download and install the runtimes separately on the second node.

It is not a shared filesystem. The main 'master' node has an ssh-key in the second 'slave' node.

While looking through the installer instructions, it mentions a script: `sshconnectivity.exp` . Where do I find that? I downloaded the `parallel_studio_xe_2017_cluster_edition_online.tgz` file. I cannot find any such script in there.

Also, my username on the master & slave are different. Won't that matter? What changes should I make to the `machines.LINUX` file?

Thanks!

Michael_S
Employee
170 Views

Hello,

You don't need that script, just make sure you can ssh password-less in between both machines and make sure that the correct username for the "other" machine is defined in the .ssh/config file of the "current" machine.

However, I would recommend you to stick with a single user name on both machines. Also, it would be very beneficial to have a simple shared filesystem like NFS for example in between both machines, this way you won't have to worry about additional issues like mirroring the path to your target MPI binary.

You may find plenty of information online on how to set up a simple two-node cluster.

Best regards,

Michael

Vishnu
Novice
170 Views

Hi Michael,

While installing, I gave it the 'machines.LINUX' file that had the two hostnames (both usernames are the same).

One of the options it then showed was: 4. Number of cluster nodes processed in parallel [ 1 ] . And upon choosing it, it only allows me values from 1 - 1. So is this because I have a node-locked license? What exactly does this mean?

Also, theres is an option: 5. Check for shared installation directory?      [ yes ] . I set it to no. I assumed that this is what you were talking about; the shared filesystem.

But now, in Step 5: Prerequisites, I get an error: File /home/user1/Applications/intel/machines.LINUX does not contain accessible
nodes. Please check your cluster description file.
My machines.LINUX file is just this:

hostname-master
hostname-slave

So does all this mean that it can only be installed on one machine?

Michael_S
Employee
170 Views

Hello,

The installer only allows you to install in parallel on 1 cluster node since you only have 1 cluster node - the first node is considered as head- node.

Regarding your machines.LINUX file, please make sure that you have password-less access in between the two systems which you can try using a simple ssh hostname-slave from the master and vice versa.

As I mentioned before, you will find plenty of information online how to set up a simple two-node cluster.

Best regards,

Michael

Vishnu
Novice
170 Views

The two machines are accessible from each other via ssh, passwordless-ly. I am attaching a small section (40 lines) of the intel.pset.root log file that I think is relevant. I don't really understand why it fails.

22508 BASH PING_CHECK
22509 BASH: CHECK_NODES_ACCESSIBLE()
22510 BASH READ_MACHINES_LINUX_FILE(): reading file /tmp/install.b5O1dR/machines.list
22511 BASH cluster_prereq_do_ping_test(): silent=silent
22512 BASH cluster_prereq_run(): ping hostname-of-slave -c 1
22513 BASH cluster_prereq_run(): return code: 0
22514 BASH cluster_prereq_do_login_test(): silent=silent
22515 BASH: CHECK_MOUNTED_FOLDERS()
22516 BASH: CHECK_MOUNTED_FOLDERS(): mount checking disabled
22517 BASH: CHECK_SHARED_NONRPM_DB: DB directory is /opt/intel
22518 BASH READ_MACHINES_LINUX_FILE(): reading file /tmp/install.b5O1dR/machines.list
22519 1475148202:968 : _shellexec_execute_command: regular waiting, querying process 10113, kill returned -1. errno 3
22520 1475148202:968 : _shellexec_execute_command: finished
22521 1475148202:968 : _shellexec_parse_envlist: started
22522 1475148202:968 : env_set_string_internal: key "CLUSTER_INSTALL_NODES_LOGIN_FAILED" with value " hostname-of-slave" (length 11)
22523 1475148202:968 : env_set_string_internal: key "SHORT_DESC" with value ";cluster_install_prerequisite_error_cannot_login_nodes_short_desc;cluster_install_prerequisite_error_empty_file_short_desc;cluster_install_prerequisite_error_no_tmp_space_short_desc" (length 181)
22524 1475148202:968 : env_set_string_internal: key "CLUSTER_INSTALL_NODES_PING_FAILED" with value "" (length 0)
22525 1475148202:968 : env_set_string_internal: key "LOGIN_TEST_FAILED" with value "yes" (length 3)
22526 1475148202:968 : env_set_string_internal: key "CLUSTER_INSTALL_prerequisite_error_no_tmp_space_full_desc" with value "Cluster installation cannot continue. Required free space in /tmp is 11519MB. Available space is 7928MB." (length 104)
22527 1475148202:968 : env_set_string_internal: key "IS_PING_AVAILABLE" with value "yes" (length 3)
22528 1475148202:968 : env_set_string_internal: key "SIGNIFICANCE" with value ";0;1;1" (length 6)
22529 1475148202:968 : env_set_string_internal: key "FULL_DESC" with value ";cluster_install_prerequisite_error_cannot_login_nodes_full_desc;cluster_install_prerequisite_error_empty_file_full_desc;CLUSTER_INSTALL_prerequisite_error_no_tmp_space_full_desc" (length 178)
22530 1475148202:968 : _shellexec_parse_envlist: finished
22531 1475148202:969 : sequence_execute_document: detecting transition. Handle: , condition: (null), next_cur: 0, next_shift: -1
22532 1475148202:969 : sequence_execute_document: next_name = _out_
22533 1475148202:969 : seq_stack_push cluster_prerequisites
22534 1475148202:969 : env_set_string_internal: key "SIGNIFICANCE" with value ";0;1;1" (length 6)
22535 1475148202:969 : env_set_string_internal: key "FULL_DESC" with value ";cluster_install_prerequisite_error_cannot_login_nodes_full_desc;cluster_install_prerequisite_error_empty_file_full_desc;CLUSTER_INSTALL_prerequisite_error_no_tmp_space_full_desc" (length 178)
22536 1475148202:969 : env_set_string_internal: key "SHORT_DESC" with value ";cluster_install_prerequisite_error_cannot_login_nodes_short_desc;cluster_install_prerequisite_error_empty_file_short_desc;cluster_install_prerequisite_error_no_tmp_space_short_desc" (length 181)
22537 1475148202:969 : env_set_string_internal: key "CLUSTER_INSTALL_NODES_PING_FAILED" with value "" (length 0)
22538 1475148202:969 : env_set_string_internal: key "CLUSTER_INSTALL_NODES_LOGIN_FAILED" with value " hostname-of-slave" (length 11)
22539 1475148202:969 : env_set_string_internal: key "CLUSTER_INSTALL_prerequisite_error_no_tmp_space_full_desc" with value "Cluster installation cannot continue. Required free space in /tmp is 11519MB. Available space is 7928MB." (length 104)
22540 1475148202:969 : sequence_execute_layer: finished
22541 1475148202:969 : sequence_execute_layer: started with id compiler_multiroot_layer
22542 1475148202:969 : getPsetCoreSubdir: started
22543 1475148202:969 : getPsetCoreSubdir: suite core subdir is /parallel_studio_xe_2017.0.035
22544 1475148202:969 : sequence_execute_document: started
22545 1475148202:969 : sequence_execute_document: step 9, layer 3, anchor: product_prerequisites, entry: cluster_prerequisites, priority: -1
22546 1475148202:969 : sequence_execute_document: sect id = product_prerequisites, seqnames = gui_inst;gui_uninst;cli_inst;cli_uninst;silent_inst;silent_uninst;
22547 1475148202:969 : dump_debug_info: cli_inst_sequence.xml, compiler_multiroot_layer, product_prerequisites, entry

Michael_S
Employee
170 Views

Hello,

It cannot login to the remote node - "CLUSTER_INSTALL_NODES_LOGIN_FAILED" with value " hostname-of-slave".

What happens if you run

$ ssh hostname-of-slave

on the node where you try to run the installer?

Best regards,

Michael

Vishnu
Novice
170 Views

Upon running that command, it promptly logs in. I am attaching an ssh log. There doesn't seem to be anything wrong. The hostname of the slave is 'braindubba'.

Michael_S
Employee
170 Views

Hello,

Just to make sure, that hostname (braindubba) is also what you used in the machines.LINUX file?

Can you try entering the IP address of that node into the machines.LINUX instead?

Alternatively you could also just install it locally on the master and then scp it over to the slave.

Best regards,

Michael

Vishnu
Novice
170 Views

Michael Steyer (Intel) wrote:

Just to make sure, that hostname (braindubba) is also what you used in the machines.LINUX file?

Yep.

Michael Steyer (Intel) wrote:

Can you try entering the IP address of that node into the machines.LINUX instead?

Just tried it with a machines.LINUX file looking like this:

127.0.0.1
172.*.*.*

Installer ives the same error.

Michael Steyer (Intel) wrote:

Alternatively you could also just install it locally on the master and then scp it over to the slave.

You mean scp the /opt/intel directory as a whole? That works? What about the mac-bound license?

Wait, so if that is true, under the normal procedure, when does it ask me for the password of the slave machine? Also, this is executed as sudo on this machine, not on the slave. so how will it install in its /opt/ ?

Michael_S
Employee
170 Views

Hello,

BTW, you should install the tools as root when you are trying to install in a location (default) where "normal" users would not have write permissions out of box. The user that exists on both machines would be later used to run MPI applications across both nodes.

Yes you can simply copy over the installation directory to the second node in your case - even that the license is node locked. The reason here is that on the second node, you want to use the runtimes only and these do not require a license at all.

Best regards,

Michael

Vishnu
Novice
170 Views

Yes, I am running it  with the `sudo` command so that the installation can write to /opt/.

I think I will follow your suggestion of copying /opt/intel. Thanks!

Reply