Purpose: Definition and discussion of set functionality, especially the function to compute the symmetrical difference of two sets. Symmetrical difference being the set containing those elements which are in one of the sets, but not the other, the inverse of intersection.
Back to the Chart of proposed set functionality.
::setops::symdiff set1 set2
Arguments:
Result:
Implementation:
Several variants are possible:
Possible shortcuts:
Timing information:
Summary:
Variant 1.
proc ::setops::symdiff {a b} { if {[llength $a] == 0} { return $b } if {[llength $b] == 0} { return $a } set res {} set a [lsort $a] set b [lsort $b] while {1} { # Store lindex/0,1 in var, access later faster ? set n [string compare [lindex $a 0] [lindex $b 0]] if {$n == 0} { # A = B => element in both, so not in sym. difference set a [lrange $a 1 end] set b [lrange $b 1 end] } elseif {$n > 0} { # A > B, remove B, we are beyond the element. # This element in B is part of the result too. lappend res [lindex $b 0] set b [lrange $b 1 end] } else { # A < B, remove A, we are beyond the element. # This element in A is part of the result too. lappend res [lindex $a 0] set a [lrange $a 1 end] } if {[llength $a] == 0} { foreach e $b { lappend res $e } return $res } if {[llength $b] == 0} { foreach e $a { lappend res $e } return $res } } return $res }
Variant 2.
proc ::setops::symdiff {a b} { if {[llength $a] == 0} { return $b } if {[llength $b] == 0} { return $a } set res {} foreach e $a { set aa($e) . } foreach e $b { set ba($e) . } foreach e $a { if {[info exists aa($e)] != [info exists ba($e)]} { lappend res $e } } foreach e $b { if {[info exists aa($e)] != [info exists ba($e)]} { lappend res $e } } return $res }
Variant 3.
proc ::setops::symdiff {a b} { if {[llength $a] == 0} { return $b } if {[llength $b] == 0} { return $a } set res {} foreach e $a { set aa($e) . } foreach e $b { set ba($e) . } foreach e $a { if {![info exists ba($e)]} { lappend res $e } } foreach e $b { if {![info exists aa($e)]} { lappend res $e } } return $res }
Variant 4.
proc ::setops::symdiff {a b} { diff [union $a $b] [Intersect2 $a $b] }
Variant 5. - DKF
proc ::setops::symdiff {a b} { if {[llength $a] == 0} { set ret $b } elseif {[llength $b] == 0} { set ret $a } else { foreach key $a {set ary($key) "1"};# Assume input is set! foreach key $b {append ary($key) "2"} set ret {} foreach {key val} [array get ary] { if {[string length $val] == 1} { lappend ret $key } } } set ret }
Variant 6. - 5a (Numbers instead of strings)
proc ::setops::symdiff {a b} { if {[llength $a] == 0} { set ret $b } elseif {[llength $b] == 0} { set ret $a } else { foreach key $a {incr ary($key)} ; # Error, ary($key) foreach key $b {incr ary($key)} ; # possibly undefined set ret {} foreach {key val} [array get ary] { if {$val == 1} { lappend ret $key } } } set ret }
DKF - But this doesn't work, since incr barfs on an undefined variable. Pity...
AK - Found that out while testing too. Seems that I got bitten from a perlie (Perl initializes the var to 0 and then performs the ++). I'll leave it in as an example of a bug.
DKF: Of course, this will actually work on 8.5 as the semantics of incr got changed.
Variant 7 - 5b (uses ideas from this page)
proc ::setops::symdiff {a b} { if {[llength $a] == 0} { set ret $b } elseif {[llength $b] == 0} { set ret $a } else { foreach key $a {set ary($key) ""} set ret {} foreach key $b { if {[info exist ary($key)]} { unset ary($key) } { lappend ret $key } } foreach key [array names ary] {lappend ret $key} # AK ?speed: eval lappend ret [array names ary] } set ret }
This variant seems to be nearly as fast as 1 in my tests (unlike 5), but has the good property of never performing a sort, and thus never having any part which is O(n log n). Calculating the actual complexity of this variant is not easy, but it is certainly bounded in the worst case by O(3n) (assuming that the input lists are the same length but have no common elements and are not sorted.) - DKF
-- AK