Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.

Installation Problems of Intel Parallel Studio XE 2016 Update 1 Cluster Edition for Linux*

Mesut_K_
Beginner
1,304 Views

Hi,

I have installed a local copy of Intel Parallel Studio for a local machine easily using the install GUI (install_GUI.sh). It is working for that local machine properly.

Then I decided try to Install it with Cluster Option. I have two Ubuntu (14.04.1) Linux machine. I tried what is proposed in the Guide (https://software.intel.com/sites/default/files/managed/b6/26/Install_Guide_0.pdf). But sshconnectivity script does not work. 

machine.LINUX file contains:

#cluster nodes
#****************************
# master node
# cahitarf (argefiz04)
# hostIP: 10.7.3.174
Fizik-bilg04-Linux
#----------------------------
# computing nodes
# azizsancar (argefiz03)
# hostIP: 10.7.3.173
sancar-linux

 

The output of the script is below. (./sshconnectivity.exp machine.LINUX) 

nubagamma@Fizik-bilg04-Linux:~/Desktop/intel_compilers/parallel_studio_xe_2016_update1$ ./sshconnectivity.exp ../machines.LINUX 
Enter your user password: *
Re-enter your user password: *
spawn /bin/sh
ssh-keygen -t rsa
sh-4.3$ ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/home/nubagamma/.ssh/id_rsa): 
/home/nubagamma/.ssh/id_rsa already exists.
Overwrite (y/n)? y
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in /home/nubagamma/.ssh/id_rsa.
Your public key has been saved in /home/nubagamma/.ssh/id_rsa.pub.
The key fingerprint is:
27:04:bb:f6:08:bb:f7:3a:39:57:08:38:b5:f3:d0:e2 nubagamma@Fizik-bilg04-Linux
The key's randomart image is:
+--[ RSA 2048]----+
|      .          |
|      .o         |
|     o.o.        |
|    o *o.        |
|    .oo*S..      |
|     +Eooo.      |
|    . ....       |
|     .= .        |
|    ...*.        |
+-----------------+
sh-4.3$ cat ~/.ssh/*.pub >> ~/.ssh/authorized_keys
sh-4.3$ chmod go-rwx ~/.ssh/authorized_keys
sh-4.3$ cat ~/.ssh/*.pub >> ~/.ssh/authorized_keys.exp8.nubagamma
sh-4.3$ chmod go-w ~/../nubagamma
sh-4.3$ ssh Fizik-bilg04-Linux
Welcome to Ubuntu 14.04.3 LTS (GNU/Linux 3.19.0-49-generic x86_64)

 * Documentation:  https://help.ubuntu.com/

10 packages can be updated.
4 updates are security updates.

Last login: Mon Feb 22 11:53:13 2016 from localhost
-bash: /opt/intel/bin/compilervars.sh: No such file or directory
nubagamma@Fizik-bilg04-Linux:~$ ssh -n sancar-linux ls -aC ~/.ssh
nubagamma@sancar-linux's password: 
Permission denied, please try again.
nubagamma@sancar-linux's password: 
Permission denied, please try again.
nubagamma@sancar-linux's password: 
Permission denied (publickey,password).
nubagamma@Fizik-bilg04-Linux:~$ 1
scp ~/.ssh/authorized_keys.exp8.nubagamma sancar-linux:~/.ssh/authorized_keys.exp8
1: command not found
nubagamma@Fizik-bilg04-Linux:~$ scp ~/.ssh/authorized_keys.exp8.nubagamma sancar-linux:~/.ssh/authorized_keys.exp8
nubagamma@sancar-linux's password: 
Permission denied, please try again.
nubagamma@sancar-linux's password: 
Permission denied, please try again.
nubagamma@sancar-linux's password: 
Permission denied (publickey,password).
lost connection
1
nubagamma@Fizik-bilg04-Linux:~$ 1
1: command not found
nubagamma@Fizik-bilg04-Linux:~$ ssh -n sancar-linux grep -f ~/.ssh/authorized_keys.exp8 ~/.ssh/authorized_keys
nubagamma@sancar-linux's password: 
Permission denied, please try again.
nubagamma@sancar-linux's password: 
Permission denied, please try again.
nubagamma@sancar-linux's password: 
Permission denied (publickey,password).
nubagamma@Fizik-bilg04-Linux:~$ 1
ssh -n sancar-linux "cat ~/.ssh/authorized_keys.exp8 >> ~/.ssh/authorized_keys"
1: command not found
nubagamma@Fizik-bilg04-Linux:~$ ssh -n sancar-linux "cat ~/.ssh/authorized_keys.exp8 >> ~/.ssh/authorized_keys"
nubagamma@sancar-linux's password: 
Permission denied, please try again.
nubagamma@sancar-linux's password: 
Permission denied, please try again.
nubagamma@sancar-linux's password: 
Permission denied (publickey,password).
nubagamma@Fizik-bilg04-Linux:~$ 1
ssh -n sancar-linux chmod go-rwx ~/.ssh/authorized_keys
1: command not found
nubagamma@Fizik-bilg04-Linux:~$ ssh -n sancar-linux chmod go-rwx ~/.ssh/authorized_keys
nubagamma@sancar-linux's password: 
Permission denied, please try again.
nubagamma@sancar-linux's password: 

Then I learned that Intel P. Studio only needs a passwordless ssh connectivity between cluster nodes. I did manually and test it manually everything is OK. But I get the following error from install_GUI.sh

Login failed
The following nodes are not accessible using the ssh command:
Fizik-bilg04-Linux sancar-linux
Installation will proceed with only the accessible nodes. 


Invalid cluster description file
File /home/nubagamma/Desktop/intel_compilers/machines.LINUX does not contain accessible nodes. Please check your cluster description file.

What I am doing wrong?

Any help, I will be very appreciated.

Mesut

0 Kudos
1 Solution
TimP
Honored Contributor III
1,304 Views

You may need to set up passwordless ssh for root as well as user, if you wish to go this way on the installation.

Note that update 2 is available (but it probably makes no difference to the question you asked).

View solution in original post

0 Kudos
3 Replies
TimP
Honored Contributor III
1,305 Views

You may need to set up passwordless ssh for root as well as user, if you wish to go this way on the installation.

Note that update 2 is available (but it probably makes no difference to the question you asked).

0 Kudos
Mesut_K_
Beginner
1,304 Views

Thanks for the help, it works. I have installed the compiler for two cluster nodes.

To test the cluster nodes, i tried a simple nested loops fortran code (please see the link). I compiled the source code with: 

ifort -parallel hello.f

I was expecting to see 8 busy cores since the each cluster nodes has 4 cores, but unfortunately this does not happen.

https://drive.google.com/file/d/0Bz7SvZzNdW5-OWJMUTJRTnFDT2M/view?usp=sharing

What am I missing this time? 

I am sorry if there is some answer to this already! I have some time limitations and I am impatient to see if i can make it work parallel cluster computing.

Thanks.

0 Kudos
Gergana_S_Intel
Employee
1,304 Views

When you compile your application with the -parallel flag, all the compiler will do is attempt to parallelize your loop through automatic enabling of OpenMP on the current machine only.  That basically means that, if your code is suitable (which it is), the compiler will create a binary that will spawn some # of OpenMP threads (equal to the number of logical cores on your machine, in your case: 4) when executed.  And it's doing that - when you look at your CPU utilization, all cores are running at 100%.

Now, what you expected to happen is to be able to use cores on *both* cluster nodes simultaneously.  That's only possible if your application is a distributed memory application (most common method is through the Message Passing Interface - MPI).  There is, unfortunately, no magical compiler switch that will make your application MPI-enabled.

Since you have the Parallel Studio XE suite installed, you should also have access to the Intel MPI Library.  We provide some sample code you can use to test things in your environment.  Just check <install_dir>/impi/5.1.2.150/test and you'll see 4 sample Hello World files.  You can compile your favorite one similarly to how you compiled your other hello.f code but now you have to use the Intel MPI compiler wrappers which will link the appropriate MPI libraries for you.

# setup your environment, assuming tools are installed in /opt/intel/
$ source /opt/intel/impi/5.1.2.150/bin64/mpivars.sh

# compile your code
$ mpiifort /opt/intel/impi/5.1.2.150/test/test.f

Once you have an executable, you can run it across both nodes.  I would suggest taking a look at our online Intel MPI Getting Started Guide.

If you have any questions, let me know.

~Gergana

0 Kudos
Reply