Version 0 of SetOps, Symmetrical difference

Updated 1999-03-18 21:39:06

Purpose: Definition and discussion of set functionality, especially the function to compute the symmetrical difference of two sets. Symmetrical difference being the set containing those elements which are in one of the sets, but not the other, the inverse of intersection.


::setops::symdiff set1 set2


Arguments:

  • Exactly two arguments
  • Each argument represents one the sets to intersect.
  • The sets are given by value, as tcl-lists.

Result:

  • A list containing (set1 - set2) + (set2 - set1).

Implementation:

Several variants are possible:

  1. Like the first variant in SetOps, Intersection, but takes the result from a different branch. SetOps, Difference can use this technique too. Predicted performance O(n+2nlogn), n = length of longest set. Variant A.
  2. Transform both lists into an array, then scan them and check for existence of the chosen element in array 1 <> existence in array 2. Predicted performance: O(4n), n above. Variant B.
  3. Transform both lists into an array, then scan each and add all elements to the result array which is not in the array of the other list. Predicted performance: O(4n), n above. Variant C.
  4. Use fastest union, difference and intersection algorithms to compute (a+b) -(a*b). Predicted performance: O(2n+2n+2n)=O(6n). Variant D.

Possible shortcuts:

  • First argument is empty => Result is the second argument.
  • Second argument is empty => Result is the first argument.

Timing information:

Summary:

  • In the best case A is consistently faster than everything else. Unfortunately its worst-case behaviour is much slower than everything else too.
  • From fast to slow the order of variants is: D, C, B, A.
  • Before the invention of using the local namespace as implicit array, and the accompanying performance boost for "diff", "union" and "intersect" the simple implementation (D) was 2nd-best after C.

Variant A.

proc ::setops::symdiff {a b} {

    if {[llength $a] == 0} {
        return $b
    }
    if {[llength $b] == 0} {
        return $a
    }

    set res {}

    set a [lsort $a]
    set b [lsort $b]

    while {1} {
        # Store lindex/0,1 in var, access later faster ?
        set n [string compare [lindex $a 0] [lindex $b 0]]
        if {$n == 0} {
            # A = B => element in both, so not in sym. difference
            set a [lrange $a 1 end]
            set b [lrange $b 1 end]
        } elseif {$n > 0} {
            # A > B, remove B, we are beyond the element. 
            # This element in B is part of the result too.
            lappend res [lindex $b 0]
            set b [lrange $b 1 end]
        } else {
            # A < B, remove A, we are beyond the element.
            # This element in A is part of the result too.
            lappend res [lindex $a 0]
            set a [lrange $a 1 end]
        }
        if {[llength $a] == 0} {
            foreach e $b {
                lappend res $e
            }

            return $res
        }
        if {[llength $b] == 0} {
            foreach e $a {
                lappend res $e
            }

            return $res
        }
    }

    return $res

}


Variant B.

    proc ::setops::symdiff {a b} {
        if {[llength $a] == 0} {
            return $b
        }
        if {[llength $b] == 0} {
            return $a
        }

        set res {}

        foreach e $a {
            set aa($e) .
        }

        foreach e $b {
            set ba($e) .
        }

        foreach e $a {
            if {[info exists aa($e)] != [info exists ba($e)]} {
                lappend res $e
            }
        }

        foreach e $b {
            if {[info exists aa($e)] != [info exists ba($e)]} {
                lappend res $e
            }
        }

        return $res
    }

Variant C.

    proc ::setops::symdiff {a b} {
        if {[llength $a] == 0} {
            return $b
        }
        if {[llength $b] == 0} {
            return $a
        }

        set res {}

        foreach e $a {
            set aa($e) .
        }

        foreach e $b {
            set ba($e) .
        }

        foreach e $a {
            if {![info exists ba($e)]} {
                lappend res $e
            }
        }

        foreach e $b {
            if {![info exists aa($e)]} {
                lappend res $e
            }
        }

        return $res
    }

Variant D.

    proc ::setops::symdiff {a b} {
        diff [union $a $b] [Intersect2 $a $b]
    }

-- AK