AM (23 december 2004) Here is a little puzzle for the days around Christmas and the Gregorian New Year:
Given a piece of text (where all whitespace has been replaced by single spaces, to make it less ambiguous), such as the description of the Endekalogue, are there any substrings of given length (longer than, say, 10 characters) that appear at least twice?
Requirements:
The best solutions are either:
Rewards for each category: honorary mention on the Wiki
Add your solutions below!
Disclaimer: I have no other intention with this contest than the contest itself - it just struck me that I would not know how to solve it using [regexp], but I considered the use of regexps for about 10 seconds ....
adavis Does this fit the bill?...
# Text to be checked for repeated substrings. set testtext "well then, has this got some repeated strings? this is a test string" # Length of required substrings minus one. set findlength 3 for {set i 0} {$i <= [expr {[string length $testtext] - $findlength}]} {incr i} { set teststring [string range $testtext $i [expr {$i + $findlength}]] if {[catch {incr result($teststring)}]} {set result($teststring) 1} } parray result
...I've used short text and substring for clarity.
AM Yes, seems a winner sofar (23 december 10:10 GMT) :)
AM I should have thought a bit longer about regexps, here is a solution with a single pattern:
set testtext "..." ;# As above regexp {(.{3,}).*\1} $text => str puts "$str"
gives: " th" after thinking very hard ... Not the longest, but at least one that is repeated.
(As for speed:
[AM] solution: 10387172 microseconds per iteration [adavis] solution: 4946 microseconds per iteration
)
Peter Spjuth: I tried the variation of finding the longest.
Inspired by Arjen's regexp I did this very slow thingy:
set minlength 4 set start 0 while {[regexp -indices -start $start "(.{$minlength,}).*\\1" $testtext => ixs]} { foreach {si ei} $ixs break set minlength [expr {$ei - $si + 2}] set start [expr {$si + 1}] } puts \"[string range $testtext $si $ei]\"
A faster one:
# Find the longest repeated substring proc findLongestSubstring {str {minlen 5}} { set len [string length $str] set first 0 set last -1 for {set t 0 ; set u [expr {$minlen - 1}]} {$u < $len} {incr u} { if {[string first [string range $str $t $u] $str [expr {$u + 1}]] >= 0} { set first $t set last $u } else { incr t } } string range $str $first $last }
DKF: Hmm, why is the regexp version so slow? This experiment is a little better, but still not great...
for {set i 0;set len 0} {$i+$len < [string length $testtext]} {incr i} { regexp -start $i {(.+).*\1} $testtext -> matched if {[string length $matched] > $len} { puts >>$matched<< set len [string length $matched] } }