Microway Application Note 16

Introduction

SSH: Installation and Configuration

NIS Server and Client Configuration

NFS Server and Client Configuration

PBS Installation and Configuration

Introduction

   This application note describes in detail the setting up, configuring and using a cluster with SSH (secure shell), NIS server and client, NFS sever and client, and the PBS (Portable Batch System) system. For simplicity we assume a cluster consisting of a single master and a single node. Scaling the system to N nodes from a single node is trivial. In this case the cluster is based on dual Pentium 4 processor motherboards, and the initial installation of the OS is done using Red Hat Linux 7.1. This installs the linux kernel 2.4.2smp. The computers in the cluster are all connected with fast-ethernet (100Mbps) interconnect. Refer to Microway Application Note 17 to set up syncronization of CMOS RTC and OS system clock for clusters. This is essential for NFS and NIS stability. Alternatively enabling Time service will achieve the same result. This is done in the /etc/inetd.conf file on all hosts  (Don't forget to restart inetd after changes on its configuration file.). However using xntpd is more reliable.  Also create a user account for each computer in the cluster. In the present case the master is set up with Level 1 RAID on the same hard disk for the /root, /home, and /boot volumes during the RedHat7.1 configuration stage. This in no way affects the configuration or working of the NIS server or NFS server on the master. The user account on each computer is called TEST with the same login name.  

SSH: Installation and Configuration 

   By default RedHat7.1 installs version 2.0 of the SSH protocol. The basic SSH protocol implementation is done using OpenSSH, a free and open source software. The following packages are needed are are installed as RPM's duirng the RedHat7.1 installation process. These are openssh 2.5.2p2-5, openssh-clients 2.5.2p2-5, openssh-askpass 2.5.2p2-5, and openssh-askpass-gnome 2.5.2p2-5. Unless you plan to install a SSH server the openssh-server package is not required. The OpenSSH daemon uses the configuration file /etc/ssh/sshd_config. The default configuration file is set to use the best security level. RSH is disabled, and the main commands are ssh, scp, slogin, and sftp. However SSH can be configured to use RSH and the corresponding RSH commands with either the ~/.rhosts or the ~/.shosts file. This is not the preferred way to use SSH as it compromises the secure enviroment.  In this case the highest level of security possible is used with DSA encryption. OpenSSH has the advantage that it automatically forwards the DISPLAY variable to the client machine. In other words, if you are running the X Window System on your local machine, and you log in to a remote machine using the ssh comand, when you execute a program on the remote machine that requires X, it will be displayed on you local machine. This is convenient if you prefer graphical system administration, cluster administration and debugging tools but do not have physical access to your server or node.

   The first step is to set up the hosts file in /etc on all the computers in the cluster. Figure 1 below shows the details of this file on the master. This assume that the ethernet cards are working and configured properly with the respective IP addresses, netmasks, and broadcasts.

Figure 1: The /etc/hosts file in the master.

   The first step in configuring ssh is to generate the DSA encrytion/decrytion key fingerprint. This is done by logging into the node one at a time from the root login and in this case the user TEST login. The first time you ssh to a remote machine (for example at the shell prompt you could type: ssh PUTLI_SLAVE.localdomain), you will see a message asking if you want to continue connecting. Enter yes, and this will add the server to your list of known hosts. Next you will see a prompt asking for you password on the remote machine.  After entering your password you will be at the shell prompt of the remote machine. However when you exit and try to login again you will be asked to provide your password again. To prevent this generate a DSA key pair. Since we are using version 2.0 of the SSH protocol the RSA key pair is not needed. To generate the DSA key pair type the following command at a shell prompt:

  ssh-keygen -t dsa

Accept the default file location of ~/.ssh/id_dsa. You will now be asked to enter a pass phrase. If you are root user it is advisable to use a passphrase: one that is different from you login password and much longer. If you are a user like TEST just press enter to ensure that you do not have to use this. Change the permission of your .ssh directory using the command chmod 755 ~/.ssh. Copy the contents of ~/.ssh/id_dsa.pub to ~/.ssh/authosized_keys2 on the machine to which you want to connect. If the file ~/.ssh/authorized_keys2 does exist cat (>>) the contents of id_dsa.pub to it. Figure 2 below shows the authorized_keys2 of the master.

 

Figure 2: /root/.ssh/authorized_keys2 file on the master for root. A similar file is present on the slave for root.

From this we see that root user from PUTLI_SLAVE can connect directly as root user to PUTLI_MASTER. The user TEST can also generate a DSA key fingerprint pair in the same way with the id_dsa.pub files now being in the/home/TEST/.ssh/ directory.

If as root or as user you insist on using the passphrase the ssh-agent utility can be configured to save your passsphrase so that you do not have to enter it each time you initiate an ssh scp or sftp connection. For this the package openssh-askpass 2.5.2p2-5 (if you are not using X-windows) or openssh-askpass-gnome 2.5.2p2-5 (for gnome) is used. For the first case at the shellprompt type:

   exec/usr/bin/ssh-agent $SHELL

Then typethe command

   ssh-add

and enter you passphrase. When you log out your passpharase will be forgotton. You must execute these two commands each time you log in to a virtual console or open aterminal window.

If you are using gnome and do not have a ~/.Xclients file, (as root or user) you can run switchdesk to create it. In your ~/.Xclients file, edit thefollowing line:

  exec $HOME/.Xclients-default

Change the above line to read

   exec/usr/bin/ssh-agent  $HOME/.Xclients-default

Open the GNOME control center and go to Session => Startup Programs. Click Add and enter /usr/bin/ssh-add in the Startup Command text area. Set it a priority to a number higher than any existing commands to ensure that it is executed last. A good priority number for ssh-add is 70 or higher. The higher the priority number, the lower the priority. If you have other programs listed, this one should have the lowest priority. Click OK to save your settings, and exit the GNOME Control Center. Log out and then back into GNOME. (Restart X) Now a dialog box should appear asking for your passphrase and from this point on you will not be promptedfor it.

If you have set up SSH correctly you can run guname remotely on the node from the master as shown in Figure 3.

Figure 3: Test to check SSH configuration. Note the remote call to an X-window utility guname.

NIS Server and Client Configuration 

   The RedHat7.1 automatically configures and installs the compiled software. This includes the ypbind daemon. As soon as the ypbind daemon is running the system becomes a NIS client. The other client programs include: ypwhich, ypcat, yppoll, and ypmatch. For the NIS server the program ypserv is needed. In this case the master is also a NIS server in the YP-domain name set PUTLI_WORKGROUP. This is unique and must be different from a domain name on the network. In the case of RedHat7.1 the ypserv daemon uses the securents file. The other option is to use tcp_wrappers. However some configuration files for tcp_wrappers have a memory leak and is thus not preferred. Before the ypserv daemon is started the yp.conf file and the securenets file must be set. Figure 4 and 5 below show the settings in this file for the NIS server on the master.

Figure 4: Settings in the yp.conf file for the NIS server on the master.

Figure 5: Settings in the securenets file for the NIS sever on the master.

Now after the the two files are set make sure that the protmapper is running. The command

  rpcinfo -u localhost ypserv

must output like that shown in Figure 6.

Figure 6: Check for ypserv's running status.

Now the NIS database can be generated. For this the command  on the master is:

   /usr/lib/yp/ypinit -m

The slave nodes and the master can now be configured for NIS Client. Make sure that the yp.conf and securenets file described above on the master is also on every of the slave nodes. Before incorporating ypbind in the startup files it must be teested. This is done as follows:

Figure 7: Test for ypbind on the master and slave node. Note both udp and tcp usage for ypbind and portmapper.

As a final verification of the NIS installation on the cluster use the ypcat and ypmatch commands both on the master and the nodes.

The details are shown below. If the contents of the NIS password file are not available as shown in Figure 8, then something is wrong, and you need to backtrack to either the NFS Client installation or the NFS server installation.

Figure 8: Test for the NIS Server and Client installation using ypcat and ypmatch.

NFS Server and Client Configuration 

   The master is the NFS server. All the nodes can mount directories directly as specified by the NFS Server. There are three main configuration files to edit to set up an NFS server: /etc/exports, /etc/hosts.allow, /etc/hosts.deny. Figures 9. 10 and 11 show these three files respectively. These files are empty on the NFS clients. The exports file is critical as the mount directories are specified here. As a rule the host.deny must be set to deny ALL for each of the NFS daemons: lockd, mountd, rquotad, statd, and the portmap. The host.allow can then specify selective access. This ensures the highest safety, and stability in the NFS mount.

Figure 9: /etc/exports file on the NFS server.

 Figure 10: /etc/hosts.allow file on the NFS server.

Figure 11: /etc/hosts.deny file on the NFS server.

NFS depends on the portmapper daemon. It must be started first. It will be in /sbin or /usr/bin. Of the five daemons lockd is called by nsfd on demand. statd and the other daemons need to be started. This is already configured in the start up scripts installed during the RedHat7.1 installation. Finally verify that NFS is running by using rpcinfo -p. This is shown in Figure 12.

Figure 12: Using rpcinfo to verify NFS Server is working correctly on the master and NFS client is working correctly on the slave. Note nfs version 2 and 3 are being used

To begin using a machine as an NFS client you will need the portmapper running on that machine, and to be able to use file locking lockd and statd must be running on both the client and the server. Figure 13 shows how a directory /home on the master is mounted and unmounted on the slave in /mnt on the slave. To get NFS file systems to be mounted at boot time edit the /etc/fstab file the same way as for local file systems that are required to be mounted at boot time. The only difference is that the file system type will be set to nfs and the dump and fsck order (the last two entries) will have to be set to zero. The mount option can be either soft or hard, as additional options. Use the command man fstab for more details. The block size (rsize and wsize) can also be set to optimize file transfer. This must be set properly in a prodcution environment, otherwise the whole objective behind an NFS mount can be defeated due to poor network NFS performance. See Microway Application Note 21 for more details on optimizing network performance.

Figure 13: Example of NFS mount /home of master in /mnt in slave.

PBS Installation and Configuration 

   The version of pbs used is the free version from Veridian that can be optained as a tar.gz file or an rpm. The rpm version is openpbs-2.3pl2-1.i386.rpm. In order to get this to work in a RedHat7.1 installation of a 2.4.2 linux kernel the tcl, tcx, and tk 8.3 rpm's are uninstalled and the corresponding 8.0 rpm's from the RedHat6.2 version are used. For details on the pros and cons of batch queing and for further details of PBS as compared to Globus or Condor for example please refer to Microway Application Note 20. After installing the pbs rpm and rebooting the system the server and mons must be configured and the scheduling policy must be implemented. In this case the rpm must be installed on all the computers in the cluster. The pbs_mom daemon must be running on each system where jobs are expected to execute. On the master which is the PBS server in this case the node lists is given to the Server in a file called nodes in the Server's home directory. This file must be provided before the server is started and the database is created. Figure 14 below shows the file in the /usr/spool/PBS directory which is the PBS_HOME/server_priv directory. Note the nodes can be set as time-sharing nodes using the :ts option after the node name. Once the server is started and gets the list of execution hosts from this nodes file the nodes file is redundant and the server database is used. To add, modify or remove nodes from the server list use the qmgr command. This is described below.

Figure 14: The nodes file in the PBS_HOME /server_priv directory.

All of the three PBS daemons (pbs_server, pbs_mom, and pbs_sched) must be run with the real and effective uid of root. Typically these are started from the system boot files (eg: /etc/rc.local). The server however must be brought up by hand the first time and configured before it is run. Figure 15 gives a typical required server configuration dialog.

Figure 15: The required server setup for PBS. Note that qmgr is very tolerant to syntax errors.

Here we have one default queue called dque into which all jobs are submitted. There are a max 4 open servers. Jobs can only be submitted from acl_hosts. If the queue dque is not specified in the the qsub shell routine then the node on which the jobs are run is specified by default_node. Managers defines which users at a specified host are granted batch system administrator privilege. Note the above is a required list. Several other options like resources_default, resources_max etc can be set. Refer to the PBS System Administration Manual for more details.

Now the pbs_mom daemon can be configured. The mom is started by default on each computer on which the pbs rpm was installed. It must be up to respond to the pbs_sever request of "are you there?" ping.

On each computer on which the mom is running the following configuration (Figure 16) file called config must also be present. Besides the clienthost option the restricted option can be used. Refer to the System Administration manual for details. However this is not eseential unless the need for connecting to mom is needed without using a privilaged port. In such a case no control requests, or resources from the config file will be allowed. Only queries are permitted.

Figure 16: The pbs_mom configuration file in the PBS_HOME/mom_priv directory called config.

The last step is the queue configuration. Figure 17 shows the required dialog for this. Here we deal with execution queues which is what dque is. The other option is to set up routing queues that move jobs to other queues. This may be necessary in a hetrogeneous environment, where a dedicated cluster may exist for specific jobs for example. Note here the default FIFO scheduler is used.

Now reboot the computers in the cluster and check the PBS installation as described below.

First check the daemons running on the master and the slave nodes. Issue ps -A -l | more and check that the relevant daemons required for the various components are up and running as desribed above for the master and the nodes. This is shown in Figure 17 (for the master) and in Figure 18 (for the slave node).

Figure 17: Daemons running on master: For SSH, NIS, NFS and PBS.

Figure 18: Daemons running on slave node: For SSH, NIS, NFS and PBS.

Nest run xpbs with the -admin option to check the PBS installation. The graphical interface is shown below in Figure 19.

Figure 19: Graphical interface of xpbs run with the -admin option. Note the queue dque, and the update in INFO window.

Selecting detail in the Server window produces the ouput dialog as shown in Figure 20.

Figure 20: Output dialog of server details.

Selecting detail in the Queue window produes the details of the queue dque. This is shown below in Figure 21.

Figure 21: Output dialog of queue dque details. Note the type of queue: Execution as opposed to Routing.

Next the submit job dialog (Figure 22) is opened. The destination queue can be selected. Either the job can be run locally, or on dque. The file staging option allows files to be staged in or out to and from the master and the nodes. Click the help button for more details.

Figure 22: The job submit and file staging dialog for PBS.

For a user xpbs (without the -admin option) can be used to submit jobs and xbsmon and monitor the job. Figure 23 shows the GUI for xbsmon.

Figure 23: GUI for xbsmon for usr job monitoring.

The use of SSH, NIS, NFS and PBS forms a very powerful cluster computing solution, that eases not only job scheduling, file staging, and administration but also provides a unified programming environment (in large clusters) from the users perspective. The user now has a single GUI to submit and monitor all his jobs, stage the files and not have to worry about loggin in and out of the various nodes. Further if a node is unavailable the PBS scheduler can automatically reroute a job. Job scheduling also enables many users to effectively share resources on the cluster increasing productivity. PBS can also be configured for automatic load balancing in time sharing hosts. For this and other options in PBS refer to the PBS documentation.

BACK TO MICROWAY APPLICATION NOTE INDEX

(Author: Nilay K. Roy.)