MS Here is a proc that compares two strings that contain whole lines - where a line is ended by a configurable single- or multi-char EOL symbol.
findOverlap finds if andhow many of the last lines in $str1 match the first lines in $str2.
This might be used (as in Tcl chatroom snaphost history (2)) to build up the history of files that where:
The proc is useful whether new lines are added at the bottom or the top of the file.
The first version below is correct; the second is lighter/faster, but suffers the stutterbug: if the first line in $str2 appears more than once in $str1, it may under some circumstances report some old lines as new. However, it never loses data, i.e., it does never report a new line as old.
# # This proc finds the overlap of two strings; it returns a list # of two indices: # . the end of the last line in $str1 not present in $str2 # . the start of the first line in $str2 not present in $str1 # | # $str1: ----------- # $str2: ---------- # | # The line separator EOL is "\n" per default, but it can be set # to any single or multicharacter value. # # The proc assumes that both $str1 and $str2 consist of full lines, # including the ending EOL proc findOverlap {str1 str2 {EOL "\n"}} { # shorten $str1 to be no longer than $str2 - it is unneeded # and makes faster searches. A long $str2 does not matter ... set len1 [string length $str1] set len2 [string length $str2] set str1 [string range $str1 end-$len2 end] set first2 [string range $str2 0 [string first $EOL $str2]] append first2 [string range $EOL 1 end] set firstLen [string length $first2] while {1} { # Find if/where $first2 is present in $str1 set index [string first $first2 $str1] if {$index == -1} { set len 0 break } # remove the start of $str1, check if the end # matches the start of $str2. This rechecks # the first line, which isn't really needed. set str1 [string range $str1 $index end] set len [string length $str1] if {[string equal -length $len $str1 $str2]} { break } # no match; remove the first line from $str1, restart set str1 [string range $str1 $firstLen end] } # $len is the length of the match set last1 [expr {$len1 - $len - 1}] list $last1 $len }
Can't this be done simply with the regexp engine? I did not find the way, yet ...
This version is more lightweight, but suffers from the stutterbug: if the first line in $str2 appears more than once in $str1, it may under some circumstances report some old lines as new.
However, it never loses data, i.e., it does never report a new line as old.
proc findOverlap {str1 str2 {EOL "\n"}} { set len1 [string length $str1] set first2 [string range $str2 0 [string first $EOL $str2]] append first2 [string range $EOL 1 end] set firstLen [string length $first2] set index [string last $first2 $str1] if {$index == -1} { set len 0 set index [expr {$len1 - 1}] } else { set len [expr {$len1 - $index}] incr index -1 } # $len is the length of the match list $index $len }