cstat

Cluster Status client / server for linux

JBR Sept 2007

I wanted a simple flexible way to view the status of my small cluster (8 nodes + head) without writing a lot of code. Here is my adaptation of the dstat[L1 ] tool to report on an entire cluster. I posted this here because this code shows several of Tcl's strengths. This code is only about 3 weeks old and the initial version was running in under a day.

  • Dead simple client/server networking
  • Sourced program configuration
  • Easy glueing of text data
  • Simple graphics/UI with Tk
  • Dead simple "OO" programming in the small using $type-method proc dispatch.

There are 3 parts. The dstatd server, the modified version of the dstat utility which collects and reports system stat, and the cstat client that displays the results to the user.

Dstat[L3 ] is a very useful GPL python script which can display lots of various system info from the /proc filesystem on linux.

I have added a couple of options to the dstat program to enhance its use as a slave program.

A daemon to launch dstat processes and report the system status back to clients. One dstat process is launched for each active client. The client requests the dstat options that it would like dstat to be launched with. This is not especially scalable but is very simple and flexable.

The client. It will show the cluster status on an ANSI terminal, write a log or display a Tk strip plot. The cstat program requires Tclx and a modified version of A simple slipchart[L6 ]

Usage

    cstat
    cstat log
    cstat plot
  • cstat configuration

Cstat reads a config file from /etc/cstat.conf and from ~/.cstat. This file is sourced in to set options for the program.

cstat.conf:

 set options { --color --noupdate -t  -c -d -n -M app 5 }

 set clients {
        piper
        mega00
        mega01
        mega02
        mega03
        mega04
        mega05
        mega06
        mega07
 }

 # Tk plotter configuration
 #
 set colours {
        red
        green
        blue
        yellow
        cyan
        magenta
        orange
        purple
        lightblue
 }

 set charts {
        "User CPU"   usr  0 {0.0 100.0  50.0} 1         average
        "Sys  CPU"   sys  1 {0.0 100.0  50.0} 1         average
        "Idle CPU"   idle 2 {0.0 100.0  50.0} 1         average
        "Wait CPU"   wait 3 {0.0 100.0  50.0} 1         average
        "Disk Read"  read 6 {0.0 200.0 100.0} 1048576   sum
        "Disk Write" writ 7 {0.0 200.0 100.0} 1048576   sum
        "Net In"     in   8 {0.0 100.0  50.0} 1048576   sum
        "Net Out"    out  9 {0.0 100.0  50.0} 1048576   sum
 }

Here are a couple of screen shots:

  Here the cluster is idle :(

http://www.cfa.harvard.edu/~john/cstat/cstat.png

  Here's a plot while crunching image data.

http://www.cfa.harvard.edu/~john/cstat/cstatplot.png


Category System Administration