Version 10 of lrmdups

Updated 2005-03-31 07:51:00 by antirez

Procedure to remove duplicate elements from a list. The returned list will be sorted.

RS: You don't even need a procedure for this, because that's exactly what lsort -unique does:

 interp alias {} lrmdups {} lsort -unique

Testing this interp alias interactively with a tclsh:

 % lrmdups {foo bar grill bar foo}
 bar foo grill

SS: Well.. it depends. lsort -unique does it in O(NlogN), while the trivial algorithm to remove duplicated elements runs in O(N). So if your list is very very long lsort may not be an option. Also you may like to have a stable algorithm that will not alter the order of non-duplicated elements.

RS: OK, but that means the sortedness specification is removed... Here is the trivial algorithm:

 proc lrmdups list {
    set res {}
    foreach element $list {
       if {[lsearch -exact $res $element]<0} {lappend res $element}
    }
    set res
 }
 % lrmdups {foo bar grill bar foo}
 foo bar grill

SS: Oops, sorry Richard, I didn't noticed the The returned list will be sorted part. So what the author if this page wants is really lsort -unique. Still your algorithm is not O(N), but O(N^2). In order to have an O(N) algorithm you need to use a dictionary with O(1) SEARCH complexity in the average case to check if an element already exists or not, i.e. a Tcl array.

RS OK, can't resist a mini-challenge :)

 proc lrmdups list {
    array set a {}
    foreach element $list {set a($element) {}}
    lsort [array names a]
 }
 % lrmdups {foo bar grill bar foo}
 bar foo grill

That's nice, and if you remove the final lsort should have O(N) complexity indeed. Still it is not stable. The following is a stable version:

 proc lrmdups list {
     array set x {}
     set res {}
     foreach e $list {
         if {[info exists x($e)]} continue
         lappend res $e
         set x($e) {}
     }
     return $res
 }
 % lrmdups {foo bar grill bar foo}
 foo bar grill