Richard Suchenwirth - From a harmless question on news:comp.lang.tcl , whether Tcl has an equivalent to isdigit() in C, evolved the following code that checks whether a string is a well-formed English number.
proc isnumber s { set re ^((one|two|three|four|five|six|seven|eight|nine|ten append re |eleven|twelve|thir|fif|teen|twen|ty|forty|fifty append re |eighty|hundred|thousand|and append re {)[ -]?)+$} regexp $re $s } isnumber {nineteen hundred and seventy-six} 1 isnumber twenty-six 1 isnumber twenty-something 0
Dan Smart commented:
Hmm, isnumber {teen thousand and ty nine} 1 isnumber teen 1 isnumber ty 1 isnumber fif 1 isnumber {one and thousand and two and fif and thir and hundred} 1
Ooops - I guess it's back to the drawing board...
It was. Here's version 0.2, written at midnight:
proc en2num s { array set dic { zero 0 one 1 two 2 three 3 four 4 five 5 six 6 seven 7 eight 8 nine 9 ten 10 eleven 11 twelve 12 thirteen 13 fifteen 15 eighteen 18 twenty 20 thirty 30 forty 40 fifty 50 eighty 80 score 20 hundred 100 thousand 1000 million 1000000 millions 1000000 } regsub -all " and |-" [string trim $s] " " s set res [list] ;# will become the translation to math foreach i [split $s] { if [info exists dic($i)] { if {$dic($i)>=1000 && [llength $res]} { regsub 000 $res "" res ;# will multiply by 1000 later set res "($res)" } if {($dic($i)>99|$i=="score") && [llength $res]} { lappend res * } else {lappend res +} lappend res $dic($i) } elseif {[regexp (.+)teen $i -> t]&&[info exists dic($t)]} { lappend res + [expr $dic($t)+10] } elseif {[regexp (.+)ty $i -> t]&&[info exists dic($t)]} { lappend res + [expr $dic($t)*10] } else {return -code error "$s is not a number: $i"} } expr $res } proc en:isnum s {expr ![catch {en2num $s}]}
This uses a parser that extracts the value of an English number, if possible, by building up, and finally evaluating, an expression. It passes Dan's test cases, even supports some outdated formats (backwards compatible ;-)
en2num {four score and seven} 87 en2num {one and twenty} 21
but still allows more than it should, so acts like a language-driven adding machine:
en2num {one two three} 6 en2num {fifty fifty} 100
To fix this, one would need a kind of slots for ones, tens, and hundreds, that could be filled maximally once, and shift these for thousands, millions, ...
For translating numbers to natural languages, see also the Bag of number/time spellers