SetOps, Union

Purpose: Definition and discussion of set functionality, especially the function to merge two or more sets.

Back to the Chart of proposed set functionality.


::setops::union set1 set2...

Arguments:

  • Any number of arguments, (including none).
  • Each argument represents one of the sets to merge.
  • The sets are given by value, as tcl-lists.

Result:

  • A list representing the union of all sets in the argument list.

Implementation:

Several variants are possible:

  • Add the list elements as keys to a hash-table, then extract the keys out of the array. Predicted performance O(nm). Removal of duplicates is done automatically. This is variant A.
  • Append the lists into a big list, then remove the duplicates. Removal of duplicates may use a sort followed by a simple linear scan, or a hash-table. Predicted runtimes are O(nm log nm) (sort+scan) and O(nm) (hash-table). n = length of longest lists, m = number of sets. These are variants B and C.
  • It is possible to use local namespace as implicit array, see Variant D in SetOps, Create. Predicted performance O(nm). This is Variant D.
Remark
Some shortcuts are possible, f.e. if none or only one set is given to the procedure.

Timing information:

Summary:

  • From fast to slow the order of variants is: A, B, C.
  • v7.6 does the tests faster than v8.0
  • Test system: Linux 2.0.33, gcc 2.7.2.3, libc 5.4.38, P2 133, 32M main memory

Update (1999 Mar 12):


Variant A.

    proc ::setops::union {args} {
        switch [llength $args] {
            0 {
                return {}
            }
            1 {
                return [lindex $args 0]
            }
            default {
                foreach set $args {
                    foreach e $set {
                        set tmp($e) .
                    }
                }
                return [array names tmp]
            }
        }
    }

Variant B.

    proc ::setops::union {args} {
        switch [llength $args] {
            0 {
                return {}
            }
            1 {
                return [lindex $args 0]
            }
            default {
                set tmp {}
                foreach set $args {
                    foreach e $set {
                        lappend tmp $e
                    }
                }
                # remove duplicates --
                # sort and scan. shortcut in case of empty or
                # single-element result.

                if {[llength $tmp] < 2} {
                    return $tmp
                }

                set tmp  [lsort $tmp]
                set last [lindex $tmp 0]
                set tmp  [lrange $tmp 1 end]
                set res  $last

                foreach e $tmp {
                    if {[string compare $e $last] != 0} {
                        lappend res $e
                        set last    $e
                    }
                }

                return $res
            }
        }
    }

Variant C.

    proc ::setops::union {args} {
        switch [llength $args] {
            0 {
                return {}
            }
            1 {
                return [lindex $args 0]
            }
            default {
                set tmp {}
                foreach set $args {
                    foreach e $set {
                        lappend tmp $e
                    }
                }
                # -W- remove duplicates --
                # hash out. shortcut in case of empty or
                # single-element result.

                if {[llength $tmp] < 2} {
                    return $tmp
                }

                foreach e $tmp {
                    set tmpa($e) .
                }

                return [array names tmpa]
            }
        }
    }

Variant D.

 proc ::setops::union {args} {
    switch [llength $args] {
        0 {
            return {}
        }
        1 {
            return [lindex $args 0]
        }
        default {
            foreach set $args {
                if {[llength $set] > 0} {
                    foreach $set {.} {break}
                }
            }

            unset args set
            info locals
        }
    }
 }

-- AK