[Richard Suchenwirth] - In Huffman coding, characters (or other data items)
are represented as bit sequences of varying length, so that the most
frequent items have the shortest bit sequences. This way, storage
requirement is reduced compared to fixed-length bit sequences, if the
frequency distribution is appropriate for the input data.

Encoding Huffman data is dirt simple with ''[string] map''. Decoding
could have been implemented by a decision tree of 25 nested [if]s (given
the overly simple 26 character alphabet that I use - no digits or
special chars, not even blank can be represented), but I expect such a
construct to be badly maintainable, and it has the disadvantage that the
data is hidden in the program structure. Therefore I decided for the
slightly wasteful approach of using [regexp] to determine and chop off matching
prefixes (this is possible because Huffman trees match the Fano
condition that no code is the prefix of another code). It is still not
noticably slow for short test words (150..200 msec on my P200 box, for
ten-letter HELLOWORLD), and you can replace the default map
by one that better fits your purposes, even at call time.

Finally, the resulting bit strings are of course longer than the input,
as every bit uses one character (this is an educational toy after all).
For practical use, one could use ''[binary] format'' to turn bit strings to
unreadable byte sequences, or map sequences of six bits to one printable
character. Such strings would be both (simply) encrypted and loss-free
compressed (trailing zeros added in the final conversion would be
ignored on decoding, because no code consists of all zeros - I had to
change Y from 00000 in Huffman's original map to 000001 to obtain that
added safety).
}
 proc huffman {mode data {map {}}} {
    if {$map==""} {
        set map {
          E 100 T 001 A 1111 O 1110 N 1100 R 1011 I 1010 S 0110 H 0101
          D 11011 L 01111 F 01001 C 01000 M 00011 U 00010 G 00001
          Y 000001 P 110101 W 011101 B 011100 V 1101001 K 110100011
          X 110100001 J 110100000 Q 1101000101 Z 1101000100
        }
    }
    switch -- $mode {
        e - encode {
            regsub -all {[^A-Z]} [string toupper $data] "" data
            string map $map $data
        }
        d - decode {
            set res ""
            while {[string length $data]} {
                puts $data/$res
                foreach {out in} $map {
                    if {[regexp ^$in\(.*) $data -> data]} {
                        append res $out
                    }
                }
            }
            set res
        }
        default {error "usage: huffman (encode|decode) data"}
    }
 }

----
[Arts and crafts of Tcl-Tk programming]