Version 3 of Mapping words to integers

Updated 2004-02-16 09:36:31

if 0 {Richard Suchenwirth 2004-02-15 - On Integer range generator, SS asked for a way to enumerate a range of words, like

 {a..z aa..az ba ..}

Such enumeration must of course be based on an alphabet - the example shows that it's supposed to be [a-z] in this case.

At first glance, the task appears similar to conversion from or to baseN, where N is the length of the alphabet. However, the zero element is somehow special: if "a" were 0, "aa" would be 00, which for an integer is the same. So I let "a" count as 1, up to "z" as 26 -- and "aa" is then (a)*26+(a), 1*26+1 = 27. Different from normal base conversion,

 az = (a)*26+(z), 1*26+26 = 52 (in place of 2*26+0)

, and so on. The code below takes care of this "zero-less" base conversion, by adding 1 in word2int, or decrementing by 1 in int2word. The empty string "" maps to resp. from 0, which makes sense for a "zero element" of a set of strings.

One trivial practical use of this is the names of Excel spreadsheet columns, which, with [A-Z] as alphabet, just follow the same pattern - but their little world ends at column 256, or "IV"... My experiments show that with the [a-z] alphabet, strings of up to 13 letters can be correctly transformed to integers. Above that size, strange errors may happen:

 % int2word [word2int kaaaaaaaaaaaaa]
 coscxanchbfnjk

So best check the string length early in word2int. I expect the limit to be related to the length of the alphabet, but haven't researched this in detail yet. }

 proc word2int {word {alphabet ""}} {
    if {$alphabet eq ""} {set alphabet [a-z]}
    set i [expr {wide(0)}]
    foreach c [split $word ""] {
        set i [expr {$i*[llength $alphabet]+[lsearch $alphabet $c]+1}]
    }
    if {$i<0} {error "word $word too long to fit in integer"}
    set i
 }
 proc int2word {int {alphabet ""}} {
    if {$alphabet eq ""} {set alphabet [a-z]}
    set word ""
    set la [llength $alphabet]
    while {$int > 0} {
        incr int -1
        set word  [lindex $alphabet [expr {$int % $la}]]$word
        set int   [expr {$int/$la}]
    }
    set word
 }
 proc a-z {} {list a b c d e f g h i j k l m n o p q r s t u v w x y z}

# Testing:

 proc must {cmd res} {
    if {[set r [eval $cmd]] ne $res} {error "$cmd -> $r, not $res"}
 }
 must {word2int a}  1
 must {word2int z}  26
 must {word2int aa} 27
 must {word2int az} 52
 must {word2int ba} 53
 must {int2word 1}  a
 must {int2word 26} z
 must {int2word 27} aa
 must {int2word 52} az
 must {int2word 53} ba
 must {int2word [word2int suchenwi]} suchenwi

#--------------------- Systematic testing in a loop

 for {set i 1} {$i<10000} {incr i} {
    if {[word2int [int2word $i]] != $i} {
        error "$i: [int2word $i] / [word2int [int2word $i]]"
    }
 }

This can also be used for a simple encryption, e.g. by multiplying resp. dividing the intermediate integers:

 % int2word [expr 2*[word2int hello]]
 pjxyd
 % int2word [expr [word2int pjxyd]/2]
 hello

Arts and crafts of Tcl-Tk programming