Server and Network Monitoring with iPhone
February 25, 2010 – 6:53 pm | No Comment

What is a Unix sysadmin doing with an iPhone, you ask? It was a birthday present, if that’s all right with you. I know, I should have gotten something odd with a beta version of …

Read the full story »
Networking

Unix and Linux network configuration. Multiple network interfaces. Bridged NICs. High-availability network configurations.

Applications

Reviews of latest Unix and Linux software. Helpful tips for application support admins. Automating application support.

Data

Disk partitioning, filesystems, directories, and files. Volume management, logical volumes, HA filesystems. Backups and disaster recovery.

Monitoring

Distributed server monitoring. Server performance and capacity planning. Monitoring applications, network status and user activity.

Commands and Shells

Cool Unix shell commands and options. Command-line tools and application. Things every Unix sysadmin needs to know.

Home » Commands and Shells, Scripts

Shell Scripting for HPC Clusters, Part 1

Submitted by Igor on October 10, 2009 – 12:59 amOne Comment
Shell Scripting for HPC Clusters, Part 1

This is the first installment of a multipart guide for beginner Unix sysadmins supporting HPC clusters.

“For” and “While” Loop Constructs

The main challenge of supporting a Linux cluster is ensuring a homogeneous environment. Aside from small differences – primarily in network configuration – cluster nodes must be identical to achieve optimal performance and to simplify troubleshooting. Scripting is an important tool for administering any Unix system and it is particularly valuable for managing clusters.

“While” Loops

In a “while” loop, we set a variable to the number of the first cluster node and increment this variable by one with every iteration of the loop. This method works well if you need to access a consecutive range of nodes that are numbered without the use of lead-in zeros (i.e. “node1″ and not “node01″).

#!/bin/ksh
i=1
while [ $i –le 128 ]
do
	ssh node$i "hostname ; date"&
	(( i = i + 1))
done

In the above example, the variable $i is set to 1 and the script connects to node1 (node$i) and runs the hostname and date commands. The variable $i is then incremented by 1, the script connects to node2 and repeats all the steps for as long as the variable $i is less or equal (-le) to 128, which is the total number of nodes in our cluster.

The following method can be used when node names use lead-in zeros or when there are gaps in the sequence.

cat nodelist.txt
node1
node2
node3
…
node128
#!/bin/ksh
cat nodelist.txt | while read nodename
do
	ssh $nodename "hostname ; date "&
done

“For” Loops

This method is best for accessing a small number of nodes, as it requires you to type every node number. This would not be the best way to access all 128 nodes in our test cluster.

#!/bin/ksh
for i in 1 2 3
do
	ssh node$i "hostname ; date "&
done

The following method is equivalent to the second “while” loop example above, as it also uses a text file containing node names.

cat nodelist.txt
node1 node2 node3 … node128
#!/bin/ksh
for nodename in `cat nodelist.txt`
do
	ssh $nodename "hostname ; date "&
done

It is recommended that you use full path for the ssh, rsh, scp, rcp, etc. The commands to be executed on the remote host must always be enclosed in double-quotes. Multiple commands should be separated by semicolons. The ampersand should follow the remote commands and it should be outside double-quotes. The purpose of the ampersand is to background commands for each node to avoid the script hanging on a single node that may be down or otherwise inaccessible.

To make it easier to control which nodes are being accessed by the loop, it is recommended to use a while loop that reads the names of the nodes from a text file. You can easily comment out any nodes you don’t want to access.

cat nodelist.txt
node1
#node2
node3
#node4
…
node128
#!/bin/ksh
cat nodelist.txt | grep -v "#" | while read nodename
do
	ssh $nodename "hostname ; date "&
done

If your node names use lead-in zeros (i.e. node001), you can still use the incremental while loop. However, it gets a bit complicated. The following loop will access nodes node001 through node128.

#!/bin/ksh
i=1
while [ $i -le 128 ]
do
	if [ $i -lt 10 ]
	then
		ssh node00$i "hostname ; date"&
	elif [ $i -lt 100 ]
	then
		ssh node0$i "hostname ; date"&
	elif [ $i -lt 1000 ]
	then
		ssh node$i "hostname ; date"&
	fi
	(( i = i + 1))
done

In a situation like this it will probably be easier to just generate a list of nodes and save it as a text file to be used as input for the loop.

#!/bin/ksh
i=1
while [ $i -le 128 ]
do
	if [ $i -lt 10 ]
	then
		echo "node00$i" >> nodelist.txt
	elif [ $i -lt 100 ]
	then
		echo "node0$i" >> nodelist.txt
	elif [ $i -lt 1000 ]
	then
		echo "node$i" >> nodelist.txt
	fi
	(( i = i + 1))
done

Practical Loop Examples

When executing complex commands on remote servers, it is a good idea to put all commands into a script and then to put this script into a directory exported via NFS to all the nodes. You can also RCP/SCP or FTP/SFTP the script to each node before running it. This way you can write simple loops that will call on the script and execute it locally on each node.

Loop Example 1

We need to connect to nodes 1 through 128 to add the new file server IP and hostname to the /etc/hosts file. We also need to add a new NFS mount to each node to be mounted at boot time.

First, create a simple script add_nfs_mount.ksh to add the file server name and IP to /etc/hosts, create a mountpoint, add the NFS mount to /etc/fstab, and to mount the new filesystem. Place this script into the shared directory /export/scripts, which is exported via NFS to all nodes.

#!/bin/ksh
 
fileserver=nfsserver1
serverip=192.168.45.10
 
echo "192.168.45.10 nfsserver1" >> /etc/hosts
 
mkdir /nfs_share1
 
echo "nfsserver1:/share1 /nfs_share1 nfs intr,bg 0 0" >> /etc/fstab
 
mount /nfs_share1

Since this script is in a directory accessible from all cluster nodes, all you need to do now is to write a simple loop that would execute this script on each node. Don’t forget to make the script executable: chmod +x /export/scripts/add_nfs_mount.ksh

#!/bin/ksh
i=1
while [ $i -le 128 ]
do
	ssh node$i "/export/scripts/add_nfs_mount.ksh"&
	(( i = i + 1 ))
done

Loop Example 2

There may be situations when you cannot mount an NFS share on all the nodes. An alternative would be to use SCP or RCP to copy the script to the nodes and then to execute is locally on each node. Let’s take a look at how this is done.

In this example we need to configure cluster nodes 1 through 128 to use US Eastern timezone and NTP. Let’s create the script /scripts/set_timezone.ksh

#!/bin/ksh
mv /etc/localtime /etc/localtime_orig
ln -sf /usr/share/zoneinfo/US/Eastern /etc/localtime
grep -v TIMEZONE /etc/sysconfig/clock > /tmp/clock
 
cat << EOF >> /tmp/clock
TIMEZONE="US/Eastern"
DEFAULT_TIMEZONE="US/Eastern"
EOF
 
mv /tmp/clock /etc/sysconfig/clock
/sbin/hwclock --systohc
 
cat << EOF > /etc/ntp.conf
server 192.168.12.12
driftfile /var/lib/ntp/drift/ntp.drift
EOF
 
/sbin/chkconfig ntp on

Now we need to create a loop to scp this script to nodes 1 through 128 and to execute it locally on each node.

#!/bin/ksh
i=1
while [ $i -le 128 ]
do
	scp /scrips/set_timezone.ksh node${i}:/tmp/
	ssh node$i "chmod +x /tmp/set_timezone.ksh ; /tmp/set_timezone.ksh"&
	(( i = i + 1 ))
done

Loop Example 3

Another way of putting a script on the cluster nodes is to use FTP/SFTP. In the following example we need to install an RPM package on each cluster node. The first step is to FTP the /tmp/package.rpm file to all the nodes.

#!/bin/ksh
 
ftp_user="mike"
ftp_pass="p@ssw0rd"
 
i=1
while [ $i -le 128 ]
do
	ftpput() {
		{
			echo "open node$i"
			echo "user $ftp_user $ftp_pass"
			echo "bin"
			echo "lcd /tmp"
			echo "cd /tmp"
			echo "put package.rpm"
			echo "quit"
		} | ftp -nvi -T 3
	}
 
	ftpput
	(( i = i + 1 ))
done

The final step is easy. All we need to do is to SSH to each node and install the RPM.

#!/bin/ksh
i=1
while [ $i -le 128 ]
do
	ssh node$i "rpm -i /tmp/package.rpm"
	(( i = i + 1 ))
done

The second part of this guide – Searching, Replacing, Comparing – will be published next week. Stay tuned.

Popularity: 7% [?]

Related posts

One Comment »

Leave a comment!

Add your comment below, or trackback from your own site. You can also subscribe to these comments via RSS.

Be nice. Keep it clean. Stay on topic. No spam.

You can use these tags:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre lang="" line="" escaped="">

This is a Gravatar-enabled weblog. To get your own globally-recognized-avatar, please register at Gravatar.