Version 4 of TkChatistics

Updated 2008-05-16 19:25:34 by PT

TR - This is fun! Since the logs of TkChat are simple Tcl files and can be read as commands, we can use them to calculate some statistics with, so-called TkChatistics. The simplest case would be to look at the number of posts per person for a specified day. The following code does this (assuming the corresponding log file is accessible and in the same dir where the script is run):

 proc m {time author msg} {
        #
        # reads one line of TkChat log and accumulates author's post count
        #
        # time -> a timestamp
        # author -> author of post
        # msg -> message posted
        #
        # This procedure uses a global array 'count' to collect the data.
        # Each element in the array has the author as key
        # and the number of posts as the value.
        #
        global count
        if {$author eq ""} return
        if {$author eq "ijchain"} {
                if {[string index $msg 0] eq "*"} return
                set index [string first > $msg]
                set author [string range $msg 0 $index]
        }
        if {! [info exists count($author)]} {
                set count($author) 1
        } else {
                incr count($author) 1
        }
 }


 proc tkChatistics {mode} {
        #
        # print statistics 
        #
        # mode -> the statistics type. Can be
        #           dayCount -> number of posts per person a day
        #           ... -> whatever you come up with!
        #
        # This procedure uses the global array 'count' for extracting
        # the data to print.
        #
        global count
        switch -- $mode {
                dayCount {
                        set max 0
                        # convert data to a nested list:
                        foreach item [array names count] {
                                lappend mylist [list $count($item) $item]
                                if {$count($item) > $max} {set max $count($item)}
                        }

                        # scale
                        set scale 5
                        set fieldWidth [expr {$max/$scale}]

                        foreach item [lsort -index 0 -integer $mylist] {
                                set num [lindex $item 0]
                                set author [lindex $item 1]
                                set stars [string repeat * [expr {$num/$scale}]]
                                puts "[format %${fieldWidth}s $stars] $num $author"
                        }
                }
                default {puts "Not implemented. You have to write this yourself ..."}
        }
 }

 ############### main code ###############
 array unset count
 source 2008-05-14.tcl
 tkChatistics dayCount

This is the output:

			   1 <suchenwi>
			   1 <tcleval>
			   1 dkf_Away
			   1 <CIA-37>
			   1 <nhalkic>
			   2 miguel
			   2 <erider>
			   3 stever
			   3 teo
			   4 rmax
			   4 <yeeling>
			   4 <jccampbell>
			   4 rfoxmich
			 * 6 <Setok>
			 * 6 nem
			 * 6 <NewHandFromCN>
			 * 6 <RockShox>
			 * 6 gerry
			 * 7 <hat1>
			 * 8 jdc
			** 10 steveo
			** 10 aku
			** 10 de
			** 11 <hat0>
			** 12 gwlester
			** 13 Zarutian
			** 14 drh
		       *** 15 emiliano
		       *** 17 kostix
		       *** 18 <Intel4004>
		      **** 20 <Phantom-X>
		      **** 23 <aspect>
		     ***** 27 <my007ms>
		     ***** 29 stu
		    ****** 33 GPS
		    ****** 33 jccampbell
		    ****** 34 jenglish
		   ******* 35 <miniK0bo>
		   ******* 37 arjen
		   ******* 38 suchenwi
		   ******* 38 dgp
		  ******** 43 mjanssen
		  ******** 43 colin
		 ********* 45 <gpolo>
		 ********* 47 <Mazzachre>
		 ********* 49 tclguy
		 ********* 49 <peterc>
	     ************* 66 patthoyts
	   *************** 76 dkf_
	 ***************** 85 dkf
    ********************** 113 ∆∆
 ************************* 126 kbk

Quite interesting distribution. The "<>" around some names indicate, these people have been connected via a bridge and were not using TkChat.


Ideas

The next step would be to add fetching the logs directly from tclers.tk and doing some more comprehensive statistics, like ...

  • average number of posts per day per person
  • relation of number of posts to size of posts
  • print the data with real plots using Plotchart, BLT, whatever
  • ...

So, enhance and have fun! (Now back to work ...)

NEM: You could string trim the user names to remove punctuation, as some of the usernames are the same (e.g. I assume dkf and dkf_ are both dkf!)

PT: the bridge actually accumulates statistics as a running total as well. So I get something the following (the numbers are the count of lines posted):

  bin/print_stats | head -12
 35890 suchenwi
 30040 dkf
 29849 [email protected]
 26165 Colin
 22871 mjanssen
 21287 kbk
 20538 GPS
 20397 dgp
 19591 stevel
 16938 hypnotoad
 15757 miguel
 14113 patthoyts

Category Community