Mapping words to floats

if 0 {Richard Suchenwirth 2004-03-31 - Here are functions to map between words (strings over an alphabet, of moderate length up to 10 characters for full precision) and floating-point numbers between 0 and 1, which indicate the lexicographic ordering position. This could be used for instance in custom collation orders. }

 proc sfrac string {
    set res 0.0
    regsub -all {[^a-z ]} [string tolower $string] " " s
    set abc {" " a b c d e f g h i j k l m n o p q r s t u v w x y z}
    set labc [llength $abc]
    set div [expr double($labc)]
    foreach char [split $s ""] {
       set res [expr {$res+[lsearch $abc $char]/$div}]
       set div [expr {$div*$labc}]
    }
    set res
 }

#-- The inverse function:

 proc fracs x {
    set abc {" " a b c d e f g h i j k l m n o p q r s t u v w x y z}
    set labc [llength $abc]
    set res ""
    while {[string length $res]<11} {
       set i [expr {int($x*$labc)}]
       append res [lindex $abc $i]
       set x [expr {$x*$labc-$i}]
    }
    set res
 }

if 0 {Testing, and usage examples:

 % sfrac hell
 0.303787250137
 % sfrac hello
 0.303788297767
 % sfrac helloa
 0.303788298094
 % sfrac world
 0.873365337165

This is close to the end of alphabet order:

 % sfrac zzzzz
 0.999999930308

...and the empty string is the beginning:

 % sfrac ""
 0.0

Try reconversion:

 % sfrac "hello world"
 0.303788297767
 % fracs 0.303788297767
 hello wormy

Oops... Maybe it has to do with tcl_precision?

 % set tcl_precision 17
 17
 % sfrac "hello world"
 0.30378829776699118

But even at maximum precision, the tenth character cannot be fully reconstructed:

 % fracs 0.30378829776699118
 hello worlc

See also Mapping words to integers


Arts and crafts of Tcl-Tk programming