Version 0 of Frequency calculation

Updated 2002-08-01 07:48:31

Richard Suchenwirth 2002-08-01 - In Solving cryptograms, amazing ways of automated decryption were shown, and CL asked for accessories e.g. for computing character and digram (2-character sequence) frequencies. I could not withstand such as little challenge, so here goes.

 proc freq12 string {
    # returns a pairlist: character/bigrams and the associated frequency
        set last -
        set n 0
        foreach char [split $string- ""] {
                incr n
                inc a($char)
                inc a($last$char)
                set last $char
        }
        set res {}
        foreach i [lsort [array names a]] {
                lappend res $i [expr {$a($i)*1./$n}]
        }
        set res
 }
 proc inc {varName {amount 1}} {
    # create a variable if not exists, then increment
    upvar 1 $varName var
    if {![info exists var]} {set var 0}
    incr var $amount
 }

 % freq12 "TCL-IS-A-SCRIPTING-LANGUAGE"
 - 0.178571428571 -A 0.0357142857143 -I 0.0357142857143 -L 0.0357142857143 
 -S 0.0357142857143 -T 0.0357142857143 A 0.107142857143 A- 0.0357142857143
 AG 0.0357142857143 AN 0.0357142857143 C 0.0714285714286 CL 0.0357142857143
 CR 0.0357142857143 E 0.0357142857143 E- 0.0357142857143 G 0.107142857143 
 G- 0.0357142857143 GE 0.0357142857143 GU 0.0357142857143 I 0.107142857143
 IN 0.0357142857143 IP 0.0357142857143 IS 0.0357142857143 L 0.0714285714286
 L- 0.0357142857143 LA 0.0357142857143 N 0.0714285714286 NG 0.0714285714286
 P 0.0357142857143 PT 0.0357142857143 R 0.0357142857143 RI 0.0357142857143
 S 0.0714285714286 S- 0.0357142857143 SC 0.0357142857143 T 0.0714285714286
 TC 0.0357142857143 TI 0.0357142857143 U 0.0357142857143 UA 0.0357142857143

Arts and crafts of Tcl-Tk programming