Cluster Status client / server for linux
JBR Sept 2007
I wanted a simple flexible way to view the status of my small cluster (8 nodes + head) without writing a lot of code. Here is my adaptation of the dstat[L1 ] tool to report on an entire cluster. I posted this here because this code shows several of Tcl's strengths. This code is only about 3 weeks old and the initial version was running in under a day.
There are 3 parts. The dstatd server, the modified version of the dstat utility which collects and reports system stat, and the cstat client that displays the results to the user.
Dstat[L3 ] is a very useful GPL python script which can display lots of various system info from the /proc filesystem on linux.
I have added a couple of options to the dstat program to enhance its use as a slave program.
A daemon to launch dstat processes and report the system status back to clients. One dstat process is launched for each active client. The client requests the dstat options that it would like dstat to be launched with. This is not especially scalable but is very simple and flexable.
The client. It will show the cluster status on an ANSI terminal, write a log or display a Tk strip plot. The cstat program requires Tclx and a modified version of A simple slipchart[L6 ]
Usage
cstat cstat log cstat plot
Cstat reads a config file from /etc/cstat.conf and from ~/.cstat. This file is sourced in to set options for the program.
cstat.conf:
set options { --color --noupdate -t -c -d -n -M app 5 } set clients { piper mega00 mega01 mega02 mega03 mega04 mega05 mega06 mega07 } # Tk plotter configuration # set colours { red green blue yellow cyan magenta orange purple lightblue } set charts { "User CPU" usr 0 {0.0 100.0 50.0} 1 average "Sys CPU" sys 1 {0.0 100.0 50.0} 1 average "Idle CPU" idle 2 {0.0 100.0 50.0} 1 average "Wait CPU" wait 3 {0.0 100.0 50.0} 1 average "Disk Read" read 6 {0.0 200.0 100.0} 1048576 sum "Disk Write" writ 7 {0.0 200.0 100.0} 1048576 sum "Net In" in 8 {0.0 100.0 50.0} 1048576 sum "Net Out" out 9 {0.0 100.0 50.0} 1048576 sum }
Here are a couple of screen shots:
Here the cluster is idle :(
Here's a plot while crunching image data.