Simple Text Widget Sort

WJG (05/Feb/2006] I've been working on wordlists recently and needed a simple sorter. Here's what I came up with.

 #---------------
 # sort.tcl
 #---------------
 # Created by William J Giddings, 2006.
 #
 # Sort list of words contained within a text widget.
 #
 # Description:
 # -----------
 # Copy the words to a tcl list, sort them and the
 # re-insert into original text.
 #
 # Usage:
 # -----
 # See demo proc for example 
 #---------------

 # set some test switches
 set DEMO(sort) yes

 #----------------
 # sort text widget wordlist
 #----------------
 proc sort {{w .txt}} {
  set str [$w get 1.0 end]
  if {$str==""} { return}

  # get the size of the list
  foreach {last tmp} [split [$w index end-1c] .] {}

  # build a tcl list of the lines of text
  set lst {} 

  for {set i 1} {$i<=$last} {incr i} {
    set str [$w get $i.0 "$i.0 lineend"]
    # trim away unecessary whitespace
    set str [string trim $str]
    if {$str==""} {continue}
    set lst [lappend lst $str]
  }

  # sort it
  set lst [lsort -dictionary $lst]

  # insert it back into the widget
  $w delete 1.0 end
  set i 0
  for {set j 0} {$j<[llength $lst]} {incr j} {
    $w insert end [lindex $lst $j]\n
  }
  # trim away the last \n
  $w delete "$j.0 lineend" end

 }

 #----------------
 # demo 
 #----------------
 proc demo {} { 
  pack [text .txt] -fill both -expand 1
  .txt insert end "Zebra\nWildebeast\nParrot\nHamster\nBear\nAardvark"
  sort
 }

 if {$DEMO(sort)} {demo}

WJP (2006-02-05) Here's a version that is a bit simpler and more compact:

 proc sort {{w .txt}} {
    set str [$w get 1.0 end]
    if {$str==""} { return}
    $w delete 1.0 end
    foreach l [lsort -dictionary [split $str "\n"]] {
        if {$l != ""} {
            $w insert end [string trim $l]\n
        }
    }
 }

and here's a version that generates a sorted set (in which the entries are unique):

 proc UniqueSort {{w .txt}} {
    set str [$w get 1.0 end]
    if {$str==""} { return}
    $w delete 1.0 end
    set Previous "";
    foreach l [lsort -dictionary [split $str "\n"]] {
        set l [string trim $l]
        if {($l != "") && ($l != $Previous)} {
            $w insert end $l\n
        }
        set Previous $l;
    }
 }

Feb 6 2006 - MG offers an even simpler version of WJP's, which doesn't use the foreach (and runs, on my computer, a whole 16 microseconds faster! ;)

 proc sort2 {{w .txt}} {
    set str [$w get 1.0 end]
    if {$str==""} { return}
    $w delete 1.0 end
    $w insert end [string trim [join [lsort -dictionary [split $str \n]] \n] \n]
 }

2006-02-06 - WJP Nice, but the functionality isn't identical. My version trims each line, whereas MG's trims extra whitespace only from the beginning of the first line and end of the last. I would think that much of the time trimming each entry would be unnecessary, but it seems to be part of WJG's requirements.

MG Actually, because of the way it sorts, all the empty lines are at the beginning of the string (so the [string trim] removes them all; you could actually use [string trimleft] instead, and probably make it quicker still) - or at least that's the case when using WJG's example data with some extra empty lines thrown in. There may be some cases where something will sort before empty lines, though I don't believe so. So it should still have the effect that all empty lines are omitted.

RS: Just for simplicity, the test if {$str == ""} return can also be dropped - if the text is empty, neither deleting nor sorting will do it any harm... and lsort has a -unique switch that could be used if wanted.

WJP: Ah, I hadn't noticed "-unique". But I still think that MG's version doesn't do exactly what mine does. He's right about the blank lines, but what if there is extra whitespace at the beginning or end of non-blank lines? In that case, you've got to trim them individually. You can see this if you try it on input like this:

 set data [join [list pig dog cow {  cow  } {  dog  } {dog }] "\n"]

Of course, if you've got that kind of input you need to trim each line BEFORE the sort (as WJG does) so that the whitespace won't affect the sort, e.g.:

 proc sort3 {{w .txt}} {
    set str [$w get 1.0 end]
    $w delete 1.0 end
    foreach l [split $str "\n"] {
      lappend clean [string trim $l]
    }
     $w insert end [join [lsort -dictionary $clean] "\n"]
 }

MG seems to have somehow totally missed the fact that yours ran [string trim] on the entries before reinserting them - that is, indeed, a difference. My apologies :)

EKB Here's another way to trim each line:

 proc TrimEachLine {s} {
    regsub -all -line -- {^\s+|\s+$} $s {}
 }

 set s { This is
  a bunch 
 of lines with   
    either white    
  space or no
    whitespace at the
 start and end of
 the lines.

 }

 puts [TrimEachLine $s]

 # Optionally:
 # tk_messageBox -message [TrimEachLine $s]

So, can modify MG's as follows (I also got rid of the check for $str == "", since it was pointed out above that this is not necessary; didn't add "unique" because not sure whether you want to exclude duplicates):

 proc sort2 {w} {
    set str [$w get 1.0 end]
    regsub -all -line -- {^\s+|\s+$} $str {} str
    $w delete 1.0 end
    $w insert end [join [lsort -dictionary [split $str \n]] \n]
 }

EKB And here's a version that lets you pass arguments to lsort (so you can choose later to do unique or not):

 proc sorttxt {w args} {
    set str [$w get 1.0 end]
    regsub -all -line -- {^\s+|\s+$} $str {} str
    $w delete 1.0 end
    $w insert end [join [eval lsort $args [list [split $str \n]]] \n]
 }

 text .t
 pack .t -expand yes -fill both -side top

 button .b -text "Go!" -command {sorttxt .t -dictionary -unique}
 pack .b

Category Example