Finding a sublist

if 0 {

Richard Suchenwirth 2004-06-04: In the Tcl chatroom I saw a request for finding the position of a sublist in a list, so that

lsubl {a b c a d e} {a d}

would return 3. A number of famous algorithms (only not to me) were mentioned: Knuth-Morris-Pratt, Rabin-Karp, Boyer-Moore-Gosper. I'm more the practical kind of guy, so I decided just to give it a try, like this:

  • Find all possible starting positions of the sublist
  • Check whether they comply; return the first to match, or -1 if none

Here's my code:

}

 proc lsubl {list sublist} {
     set sl [llength $sublist]
     if {!$sl} {return 0}
     foreach i [lsearch -all $list [lindex $sublist 0]] {
         if {[lrange $list $i [expr {$i+$sl-1}]] eq $sublist} {
             return $i
         }
     }
     return -1
 }
 #Tests:
 proc ? {c e} {catch $c r; if {$r ne $e} {puts "$c->$r, expected $e"}}

 ? {lsubl {a b c a d e} {a d}}  3
 ? {lsubl {a b c a d e} {a b}}  0
 ? {lsubl {a b c a d e} {a f}} -1
 ? {lsubl {a b c a d e} {}}     0 ;# empty list is everybody's sublist

if 0 {

Could some algorithm-knowers comment, please? :)

FW: Hmm, algorithms for this task are probably so many many because it's pretty much the same idea as finding the index of a substring, which of course is very common. For your one-char items

 proc lsubs {list sublist} {string first [join $sublist ""] [join $list ""]}

works.

RS: Clever indeed - thanks! (But the question was of course for general list elements...)

Lars H 2004-06-03: Because everything is a string, string first can do this with varying length list elements as well. Consider

 proc lsubs {list sublist} {
    set L [lrange $list 0 end]
    set S [lrange $sublist 0 end]
    set charindex [string first " $S " " $L "]
    if {$charindex < 0} then {return -1}
    return [llength [string range $L 0 [expr {$charindex - 1}]]]
 }

I admit this wasn't quite as easy as I first thought it was, though. There's the $charindex < 0 case to take care of, and in particular it was necessary to add the spaces around the list and sublist to prevent matches against parts of list elements (this had the side effect of no longer finding the empty list, but that is kind of special anyway). Furthermore some bugs remain -- in particular it may incorrectly find the sublist if it is a sublist of an element of the list -- but I think it will at least error in those cases (llength applied to non-list).

jcw 2004-06-03: Here's a Boyer-Moore version:

  proc lsame {a b {o 0}} {
    set n [llength $a]
    while {[incr n -1] >= 0} {
      if {[lindex $a $n] ne [lindex $b [expr {$n+$o}]]} { return 0 }
    }
    return 1
  }

  proc lsubl {l s} {
    set n [llength $s]
    if {$n == 0} { return 0 }
    for {set i 0} {$i < $n} {incr i} {
      set o([lindex $s $i]) [expr {$n-$i-1}]
    }
    set i 0
    set m [llength $l]
    while {$i < $m} {
      set c [lindex $l $i]
      if {![info exists o($c)]} {
        incr i $n
      } elseif {$o($c) > 0} {
        incr i $o($c)
      } else {
        set k [expr {$i-$n+1}]
        if {[lsame $s $l $k]} { return $k }
        incr i
      }
    }
    return -1
  }