Version 6 of Extract Numbers From a String

Updated 2022-10-03 09:54:43 by {William Giddings}

WJG 2022-10-01: A quick snippet on extracting a list of numbers from a string.

proc extractNumbers {str} {
        set res ""
        foreach c [split $str ""] {
                if { [string is integer $c] } {
                        set a 1
                        append res $c
                } elseif { $c eq "," || $c eq "." } {
                        if {$a} { append res $c }
                } else {
                        set a 0
                        append res " "
                }
        }
        return [string trim $res]
}

pyk 2022-10-02: What about numbers like "this .13 here?". Over at Regular Expression Examples there's a regexp command to extract numbers from a string.

WJG 22-10-02: Firstly, is ".13" a number? I'm not interested in numbers per se, but extracting sub-strings. The examples that I've seen and the examples that I actually understand are, respectively, many and none. I'm sure that there's some arcane regexp "formula" to extract a series of numeric substrings from a passage of text, it takes a better man than me to work that one out.

WJG 22-10-03: Made some changes to the above proceedure to allow for sub-string prefixes (+-) and infixex (.,/^). Seeing at a numeric sequence could end a clause which would append a either a comma or full-stop as sentence punctuation, these are removed from any result.

proc extractNumbers {str} {
        
        set buff ""
        set res ""
        set lc ""
        
        set pf "-+"                ;# number sequence prefixes 
        set if ".,/ ^"        ;# number sequence infixes
        
        # parse the string character by character
        foreach c [split $str ""] {
                # respond to integers
                if { [string is integer $c] } {
                        set a 1        ;# toggle START of integer sequence 
                        if {[string first $lc $pf] != -1 } { append buff $lc } 
                        append buff $c
                } elseif { [string first $c $if] != -1 } { 
                        if {$a} { append buff $c }
                } else {
                        set a 0 ;# toggle END of integer sequence
                        append buff " "
                }
                # keep tally for potential prefixes
                set lc $c
        }
        
        # remove sentence punction and reformat list
        foreach item $buff { lappend res [string trimright $item $pf$if] }
        
        return $res
}

So, using a couple of jibberish test strings gives:

puts [extractNumbers "a b +100 d f -200 h l 1,000 xd 100,000,000.234, and 34."] 
+100 -200 1,000 100,000,000.234 34

puts [extractNumbers "1/25 3.123^4 10^6"]
1/25 3.123^4 10^6