[Richard Suchenwirth] - In Huffman coding, characters (or other data items) are represented as bit sequences of varying length, so that the most frequent items have the shortest bit sequences. This way, storage requirement is reduced compared to fixed-length bit sequences, if the frequency distribution is appropriate for the input data. Encoding Huffman data is dirt simple with ''[string] map''. Decoding could have been implemented by a decision tree of 25 nested [if]s (given the overly simple 26 character alphabet that I use - no digits or special chars, not even blank can be represented), but I expect such a construct to be badly maintainable, and it has the disadvantage that the data is hidden in the program structure. Therefore I decided for the slightly wasteful approach of using [regexp] to determine and chop off matching prefixes (this is possible because Huffman trees match the Fano condition that no code is the prefix of another code). It is still not noticably slow for short test words (150..200 msec on my P200 box, for ten-letter HELLOWORLD), and you can replace the default map by one that better fits your purposes, even at call time. Finally, the resulting bit strings are of course longer than the input, as every bit uses one character (this is an educational toy after all). For practical use, one could use ''[binary] format'' to turn bit strings to unreadable byte sequences, or map sequences of six bits to one printable character. Such strings would be both (simply) encrypted and loss-free compressed (trailing zeros added in the final conversion would be ignored on decoding, because no code consists of all zeros - I had to change Y from 00000 in Huffman's original map to 000001 to obtain that added safety). } proc huffman {mode data {map {}}} { if {$map==""} { set map { E 100 T 001 A 1111 O 1110 N 1100 R 1011 I 1010 S 0110 H 0101 D 11011 L 01111 F 01001 C 01000 M 00011 U 00010 G 00001 Y 000001 P 110101 W 011101 B 011100 V 1101001 K 110100011 X 110100001 J 110100000 Q 1101000101 Z 1101000100 } } switch -- $mode { e - encode { regsub -all {[^A-Z]} [string toupper $data] "" data string map $map $data } d - decode { set res "" while {[string length $data]} { puts $data/$res foreach {out in} $map { if {[regexp ^$in\(.*) $data -> data]} { append res $out } } } set res } default {error "usage: huffman (encode|decode) data"} } } ---- [Arts and crafts of Tcl-Tk programming]