## Mapping words to integers

Richard Suchenwirth 2004-02-15 - On Integer range generator, SS asked for a way to enumerate a range of words, like

` {a..z aa..az ba ..}`

Such enumeration must of course be based on an alphabet - the example shows that it's supposed to be [a-z] in this case.

At first glance, the task appears similar to conversion from or to baseN, where N is the length of the alphabet. However, the zero element is somehow special: if "a" were 0, "aa" would be 00, which for an integer is the same. So I let "a" count as 1, up to "z" as 26 -- and "aa" is then (a)*26+(a), 1*26+1 = 27. Different from normal base conversion,

` az = (a)*26+(z), 1*26+26 = 52 (in place of 2*26+0)`

, and so on. The code below takes care of this "zero-less" base conversion, by adding 1 in word2int, or decrementing by 1 in int2word. The empty string "" maps to resp. from 0, which makes sense for a "zero element" of a set of strings. Come to think, in "words" over an alphabet, all characters are equal, while in integers (at any base), leading zeroes are redundant. So zero is a special case, and the other characters can occur in a 1..N range, while with baseN conversion, the characters used are in the 0..(N-1) range. If zero is "" and part of the rendering alphabet, we would be thrown back before the time zero as visible digit was invented (in India) - because nobody could see it :)

One trivial practical use of this is the names of Excel spreadsheet columns, which, with [A-Z] as alphabet, just follow the same pattern - but their little world ends at column 256, or "IV"... My experiments show that with the [a-z] alphabet, strings of up to 13 letters can be correctly transformed to integers. Above that size, strange errors may happen:

``` % int2word [word2int kaaaaaaaaaaaaa]
coscxanchbfnjk```

So best check the string length early in word2int. I expect the limit to be related to the length of the alphabet, but haven't researched this in detail yet.

Note the cute side-effect that the name of the alphabet-listing procedure a-z looks, when called (in brackets), exactly like the corresponding regular expression. And make sure if you change that, say to [a-z0-9], to provide a proc with that name... Even if it looks like a RE, it is just a function call.

``` proc word2int {word {alphabet ""}} {
if {\$alphabet eq ""} {set alphabet [a-z]}
set i [expr {wide(0)}]
foreach c [split \$word ""] {
set i [expr {\$i*[llength \$alphabet]+[lsearch \$alphabet \$c]+1}]
}
if {\$i<0} {error "word \$word too long to fit in integer"}
set i
}
proc int2word {int {alphabet ""}} {
if {\$alphabet eq ""} {set alphabet [a-z]}
set word ""
set la [llength \$alphabet]
while {\$int > 0} {
incr int -1
set word  [lindex \$alphabet [expr {\$int % \$la}]]\$word
set int   [expr {\$int/\$la}]
}
set word
}
proc a-z {} {list a b c d e f g h i j k l m n o p q r s t u v w x y z}
# Testing:
proc must {cmd res} {
if {[set r [eval \$cmd]] ne \$res} {error "\$cmd -> \$r, not \$res"}
}
must {word2int a}  1
must {word2int z}  26
must {word2int aa} 27
must {word2int az} 52
must {word2int ba} 53
must {int2word 1}  a
must {int2word 26} z
must {int2word 27} aa
must {int2word 52} az
must {int2word 53} ba
must {int2word [word2int suchenwi]} suchenwi
#--------------------- Systematic testing in a loop
for {set i 1} {\$i<10000} {incr i} {
if {[word2int [int2word \$i]] != \$i} {
error "\$i: [int2word \$i] / [word2int [int2word \$i]]"
}
}```

This can also be used for a simple encryption, e.g. by multiplying resp. dividing the intermediate integers, or adding/subtracting a well-known integer:

``` % int2word [expr 2*[word2int hello]]
pjxyd
% int2word [expr [word2int pjxyd]/2]
hello
% int2word [expr [word2int message] + [word2int secretkey]]
sepwymlmd
% int2word [expr [word2int sepwymlmd] - [word2int secretkey]]
message```

Note however that for addition, message and key should be of about the same length, othe====== rwise the prefix of the longer will show unhidden, like the "se" above in "sepwymlmd", or more evident:

``` % int2word [expr [word2int longmessage] + [word2int key]]
longmesslmd
% int2word [expr [word2int word] + [word2int verylongkey]]
veryloodzxc```

Not different from when you add a long and a short integer...