Richard Suchenwirth 2004-02-15 - On Integer range generator, SS asked for a way to enumerate a range of words, like
{a..z aa..az ba ..}
Such enumeration must of course be based on an alphabet - the example shows that it's supposed to be [a-z] in this case.
At first glance, the task appears similar to conversion from or to baseN, where N is the length of the alphabet. However, the zero element is somehow special: if "a" were 0, "aa" would be 00, which for an integer is the same. So I let "a" count as 1, up to "z" as 26 -- and "aa" is then (a)*26+(a), 1*26+1 = 27. Different from normal base conversion,
az = (a)*26+(z), 1*26+26 = 52 (in place of 2*26+0)
, and so on. The code below takes care of this "zero-less" base conversion, by adding 1 in word2int, or decrementing by 1 in int2word. The empty string "" maps to resp. from 0, which makes sense for a "zero element" of a set of strings. Come to think, in "words" over an alphabet, all characters are equal, while in integers (at any base), leading zeroes are redundant. So zero is a special case, and the other characters can occur in a 1..N range, while with baseN conversion, the characters used are in the 0..(N-1) range. If zero is "" and part of the rendering alphabet, we would be thrown back before the time zero as visible digit was invented (in India) - because nobody could see it :)
One trivial practical use of this is the names of Excel spreadsheet columns, which, with [A-Z] as alphabet, just follow the same pattern - but their little world ends at column 256, or "IV"... My experiments show that with the [a-z] alphabet, strings of up to 13 letters can be correctly transformed to integers. Above that size, strange errors may happen:
% int2word [word2int kaaaaaaaaaaaaa] coscxanchbfnjk
So best check the string length early in word2int. I expect the limit to be related to the length of the alphabet, but haven't researched this in detail yet.
Note the cute side-effect that the name of the alphabet-listing procedure a-z looks, when called (in brackets), exactly like the corresponding regular expression. And make sure if you change that, say to [a-z0-9], to provide a proc with that name... Even if it looks like a RE, it is just a function call.
proc word2int {word {alphabet ""}} { if {$alphabet eq ""} {set alphabet [a-z]} set i [expr {wide(0)}] foreach c [split $word ""] { set i [expr {$i*[llength $alphabet]+[lsearch $alphabet $c]+1}] } if {$i<0} {error "word $word too long to fit in integer"} set i } proc int2word {int {alphabet ""}} { if {$alphabet eq ""} {set alphabet [a-z]} set word "" set la [llength $alphabet] while {$int > 0} { incr int -1 set word [lindex $alphabet [expr {$int % $la}]]$word set int [expr {$int/$la}] } set word } proc a-z {} {list a b c d e f g h i j k l m n o p q r s t u v w x y z} # Testing: proc must {cmd res} { if {[set r [eval $cmd]] ne $res} {error "$cmd -> $r, not $res"} } must {word2int a} 1 must {word2int z} 26 must {word2int aa} 27 must {word2int az} 52 must {word2int ba} 53 must {int2word 1} a must {int2word 26} z must {int2word 27} aa must {int2word 52} az must {int2word 53} ba must {int2word [word2int suchenwi]} suchenwi #--------------------- Systematic testing in a loop for {set i 1} {$i<10000} {incr i} { if {[word2int [int2word $i]] != $i} { error "$i: [int2word $i] / [word2int [int2word $i]]" } }
This can also be used for a simple encryption, e.g. by multiplying resp. dividing the intermediate integers, or adding/subtracting a well-known integer:
% int2word [expr 2*[word2int hello]] pjxyd % int2word [expr [word2int pjxyd]/2] hello % int2word [expr [word2int message] + [word2int secretkey]] sepwymlmd % int2word [expr [word2int sepwymlmd] - [word2int secretkey]] message
Note however that for addition, message and key should be of about the same length, othe====== rwise the prefix of the longer will show unhidden, like the "se" above in "sepwymlmd", or more evident:
% int2word [expr [word2int longmessage] + [word2int key]] longmesslmd % int2word [expr [word2int word] + [word2int verylongkey]] veryloodzxc
Not different from when you add a long and a short integer...
See also Mapping words to floats
This code found a surprising use in Brute force meets Goedel, where int2word could produce RPN programs from their Goedel number