SetOps, Difference

Purpose: Definition and discussion of set functionality, especially the function to compute the difference of two sets. Difference being the set containing those elements which are in the first set but not the second.

Back to the Chart of proposed set functionality.

: ::setops::diff set1 set2

Arguments:

Exactly two arguments
Each argument represents one the sets to intersect.
The sets are given by value, as tcl-lists.

Result:

A list containing the difference set1 - set2.

Implementation:

Several variants are possible:

Like the first variant in SetOps, Intersection, but takes the result from a different branch. SetOps, Symmetrical difference can use this technique too. Predicted performance O(n+2nlogn), n = length of longest set. Variant A.
Transforms set b into an array, then scan set a and place everything into the result not found in the array. Predicted performance O(2n), for n as above. Variant B.
Like (2), but using the local namespace as an implicit array. Predicted performance O(2n), n as above. Variant C.

Shortcuts to use in the code:

First argument is empty => Result is empty.
Second argument is empty => Result is the first argument.
An intermediate first argument is empty => Result is current collection.
An intermediate second argument is empty => Result is current result + tail of first argument.

Timing:

Summary:

C was the best of the 3 implementations.

Variant A.

 proc ::setops::diff {a b} {
    if {[llength $a] == 0} {
        return {}
    }
    if {[llength $b] == 0} {
        return $a
    }

    set res {}

    set a [lsort $a]
    set b [lsort $b]

    while {1} {
        # Store lindex/0,1 in var, access later faster ?
        set n [string compare [lindex $a 0] [lindex $b 0]]
        if {$n == 0} {
            # A = B => element in both, so not in difference
            set a [lrange $a 1 end]
            set b [lrange $b 1 end]
        } elseif {$n > 0} {
            # A > B, remove B, we are beyond the element. 
            set b [lrange $b 1 end]
        } else {
            # A < B, remove A, we are beyond the element.
            # This element in A is part of the result too.
            lappend res [lindex $a 0]
            set a [lrange $a 1 end]
        }
        if {[llength $a] == 0} {
            return $res
        }
        if {[llength $b] == 0} {
            foreach e $a {
                lappend res $a
            }

            return $res
        }
    }

    return $res
 }

Variant B.

    proc ::setops::diff {a b} {
        if {[llength $a] == 0} {
            return {}
        }
        if {[llength $b] == 0} {
            return $a
        }

        set res {}

        foreach e $b {
            set ba($e) .
        }

        foreach e $a {
            if {![info exists ba($e)]} {
                lappend res $e
            }
        }

        return $res
    }

Variant C.

 proc ::setops::diff {a b} {
    if {[llength $a] == 0} {
        return {}
    }
    if {[llength $b] == 0} {
        return $a
    }

    set res {}

    foreach $b {.} {break}

    foreach e $a {
        if {![info exists $e]} {
            lappend res $e
        }
    }

    return $res
 }

-- AK

Category Data Structure