Version 1 of Some ways to do set comparision

Updated 2003-04-06 15:28:49

In Tcl Gems Michael Schlenker saw a proc by Ed Suominen to do list comparision (set comparision would be more precise). This is related to Manipulating sets in tcl.

As setok said it seemed suboptimal, so i did some tests with alternatives:

So here are the rivals:

Ed's original:

 proc listcomp1 { list1 list2 out1Name out2Name } {

   ### Define empty lists in case one has no unique elements
   set out1 {}; set out2 {}

   ### Test each element of each list against all elements of other list

   foreach {i} $list1 {j} $list2 {

     # First, test for unique element in list1
     if { [ lsearch  -exact $list2 $i ] < 0 } { lappend out1 $i }
     # Then test for unique element in list2
     if { [ lsearch  -exact $list1 $j ] < 0 } { lappend out2 $j }
   }

   ### Put results in specified lists
   upvar $out1Name x
   set x $out1
   upvar $out2Name x
   set x $out2

   ### END LISTCOMP
   return
 }

My first try with presorted lists:

 proc listcomp2 { list1 list2 out1Name out2Name } {

   set out1 {}; set out2 {}

   set list1 [lsort -increasing [K $list1 [set list1 ""]]]
   set list2 [lsort -increasing [K $list2 [set list2 ""]]]
   foreach {i} $list1 {j} $list2 {

     if { [ lsearch  -sorted -exact $list2 $i ] < 0 } { lappend out1 $i }
     if { [ lsearch  -sorted -exact $list1 $j ] < 0 } { lappend out2 $j }
   }

   upvar #0 $out1Name x
   set x $out1
   upvar #0 $out2Name x
   set x $out2

   return
 }

My second try with an array:

 proc listcomp3 { list1 list2 out1Name out2Name } {

   set x [list]
   set y [list]

   # cache boolean representation in 0val and 1val
   set 0val [expr {0!=0}]
   set 1val [expr {1==1}]

   foreach item $list1 {
        set A($item) $0val
   }

   foreach item $list2 {
        if {[info exists A($item)]} {
            unset A($item)
        } else {
            set A($item) $1val
        }
   }
   foreach key [array names A] {
        if {$A($key)} {
            lappend x $key
        } else {
            lappend y $key
        }
   }

   upvar $out1Name B
   set B $x
   upvar $out2Name C
   set C $y

   return
 }

The third try with lists and lsort -unique:

 proc listcomp4 {list1 list2 outVar1 outVar2} { 

    set A [list]
    set B [list]

    set joined [lsort -unique [concat $list1 $list2]]

    foreach item $joined {
        if {[lsearch -exact $list1 $item] < 0 } {
            lappend A $item
        } else {
            lappend B $item
        }
    }

    upvar $outVar1 x
    set x $A
    upvar $outVar2 x
    set x $B

    return
 }

Fourth try modifiying the listcomp3:

 proc listcomp5 {list1 list2 outVar1 outVar2} { 

    set A [list]
    set B [list]

    if {[llength $list1] < [llength $list2]} {
        set short $list1
        set SA A
        set SB B
    } else {
        set short $list2
        set SA B
        set SB A
    }

    set joined [lsort -unique [concat $list1 $list2]]

    foreach item $joined {
        if {[lsearch -exact $short $item] != -1 } {
            lappend $SA $item
        } else {
            lappend $SB $item
        }
    }

    upvar $outVar1 x
    set x $A
    upvar $outVar2 x
    set x $B

    return
 }

Fifth try with a slightly modified lsort -unique and some temp lists:

 proc listcomp6 {list1 list2 outVar1 outVar2} {

    set A [list]
    set B [list]

    set joined [list]

    foreach item $list1 {
        lappend joined [list $item A]
    }
    foreach item $list2 {
        lappend joined [list $item B]
    }
    set joined [lsort -unique -index 0 [K $joined [set joined ""]]]

    foreach item $joined {
        lappend [lindex $item 1] [lindex $item 0]
    }

    upvar $outVar1 x
    set x $A
    upvar $outVar2 x
    set x $B
 }

Now the simple test suite:

 proc buildlist {n m} {
    set result [list]
    for {set i 0} {$i < $n} {incr i} {
        if {$i % $m} {
            lappend result $i
        }
    }
    return $result
 }

 proc buildlistR {n m} {
    set result [list]
    for {set i $n} {$i > 0} {incr i -1} {
        if {$i % $m} {
            lappend result $i
        }
    }
    return $result
 }

 proc timetest {} {
    set l1 [buildlist 1000 3]
    set l2 [buildlist 1000 2]

    puts "Lists have [llength $l1], [llength $l2] elements"
    set A [list]
    set B [list]

    puts "Starting"
    set t1 [time {listcomp1 $l1 $l2 A B} 100]
    puts "1 done"
    set t2 [time {listcomp2 $l1 $l2 A B} 100]
    puts "2 done"
    set t3 [time {listcomp3 $l1 $l2 A B} 100]
    puts "3 done"
    set t4 [time {listcomp4 $l1 $l2 A B} 100]
    puts "4 done"
    set t5 [time {listcomp5 $l1 $l2 A B} 100]
    puts "5 done"
    set t6 [time {listcomp6 $l1 $l2 A B} 100]

    puts "Timing elements"
    puts "Listcomp 1: $t1"
    puts "Listcomp 2: $t2"
    puts "Listcomp 3: $t3"
    puts "Listcomp 4: $t4"
    puts "Listcomp 5: $t5"
    puts "Listcomp 6: $t6"

    set l1 [buildlistR 1000 3]
    set l2 [buildlist 1000 2]

    puts "Lists have [llength $l1], [llength $l2] elements"
    set A [list]
    set B [list]

    puts "Starting"
    set t1 [time {listcomp1 $l1 $l2 A B} 100]
    puts "1 done"
    set t2 [time {listcomp2 $l1 $l2 A B} 100]
    puts "2 done"
    set t3 [time {listcomp3 $l1 $l2 A B} 100]
    puts "3 done"
    set t4 [time {listcomp4 $l1 $l2 A B} 100]
    puts "4 done"
    set t5 [time {listcomp5 $l1 $l2 A B} 100]
    puts "5 done"
    set t6 [time {listcomp6 $l1 $l2 A B} 100]

    puts "Timing elements"
    puts "Listcomp 1: $t1"
    puts "Listcomp 2: $t2"
    puts "Listcomp 3: $t3"
    puts "Listcomp 4: $t4"
    puts "Listcomp 5: $t5"
    puts "Listcomp 6: $t6"
 }

 Running this on my PII 350, ActiveTcl 8.4.2.0 gave the following results:

 (ascending sorted lists)
 Lists have 666, 500 elements
 Listcomp 1: 224236 microseconds per iteration
 Listcomp 2: 230076 microseconds per iteration
 Listcomp 3: 26275 microseconds per iteration
 Listcomp 4: 142584 microseconds per iteration
 Listcomp 5: 131349 microseconds per iteration
 Listcomp 6: 26383 microseconds per iteration

 (mixed sorted lists)
 Lists have 667, 500 elements
 Listcomp 1: 220062 microseconds per iteration
 Listcomp 2: 220435 microseconds per iteration
 Listcomp 3: 26218 microseconds per iteration
 Listcomp 4: 136292 microseconds per iteration
 Listcomp 5: 125391 microseconds per iteration
 Listcomp 6: 22591 microseconds per iteration

So the winner is listcomp6 using lsort -unique followed by listcomp3 using the array approach. Can anyone do faster than this? Probably. Like to see it.