[Lars H]: I suspect whoever added [[LZ78 Compression]] to [New Pages] really meant to link to [LZ77 Compression], but since it now exists it might as well get some content. The following is a Tcl implementation of LZ78 encoding of the data in a file: proc LZ78_encode {F} { set old "" set Dict() 0 set res [list] while {![eof $F]} { set ch [read $F 1] if {[info exists Dict($old$ch)]} then { append old $ch } else { lappend res $Dict($old) $ch set Dict($old$ch) [array size Dict] set old "" } } if {[string length $old]} then {lappend res $Dict($old)} return $res } The idea is to build a dictionary of character sequences and only output a "token number" for those phrases that are in this dictionary. The longest matching phrase found in the dictionary is chosen. The list element after a token is always a character. The dictionary (the Dict array) is extended with a new entry for every old token + 1 character sequence that hasn't been seen before. As for compression rate of the above: the 92659 bytes file tclObj.c gets encoded as a list of 31863 elements. Achieving actual compression thus also requires finding a good binary format for encoding this list. But there are also ways of improving the LZ78 algorithm, which are used in LZW compression. ---- [Category Compression]