Version 7 of String occurrences

Updated 2018-02-18 11:41:51 by WJG

WJG 2013-03-16: Needed a simple way of counting the occurrences of one string in another. I was surprised to find that the standard Tcl string command had no offering.

proc string_occurrences {needleString haystackString} {

    set j [string first $needleString $haystackString 0]
    if {$j == -1} {return 0}

    set i 0
    while {$j != -1 } {
        set j [string first $needleString $haystackString [incr j]]
        incr i
    }

    return $i
}

SEH How about:

 llength [regexp -all -inline (?=$needleString) $haystackString]

AMG: This gets into trouble when $needleString contains regular expression metacharacters, all of which will need to be preceded by backslashes. Try this:

regsub -all {[][{}()\\^$*+?.]} $needleString {\\&} needleString

PYK: Another option is ycl::string::delimit:

package require ycl::string
namespace import [yclprefix]::string::delimit

set data "one and two and three and four andand five"
delimit $data string and format count
# 5

WJG 2018-02-18 I've recently revisted this proc and made some simple changes to enable me to retrieve the locations of the needle string in the haystack. If I was working in a language which uses the roman writing system, ie. words delimited by whitespace then the haystack string could be treated like a list. This approach is not helpful when working in Chinese or Japanese, for example, where words are not delimited in the writing system and so just blend into each other. Single characater searches are not much use either as Chinese or Japanese can have multiple syllables. So,following the lsearch model above, we can add an -inline switch to the string_occurrences proc too.

proc string_occurrences {args} {

    set opt [lindex $args 0]
    set needleString [lindex $args end-1]
    set haystackString [lindex $args end]

    set j [string first $needleString $haystackString 0]
    if {$j == -1} {return 0}

        append res $j

    set i 0
    set d [string length $needleString]
    while {$j != -1 } {
        set j [string first $needleString $haystackString [incr j]]
        incr i $d
        if {$j != -1} { lappend res $j }
    }

    if { $opt eq "-inline" } { return $res }
    return $i
}