Purpose: Definition and discussion of set functionality, especially the function to merge two or more sets. Back to the [Chart of proposed set functionality]. ---- : '''::setops::union''' ''set1 set2...'' ---- Arguments: * Any number of arguments, (including none). * Each argument represents one of the sets to merge. * The sets are given by value, as tcl-lists. Result: * A list representing the union of all sets in the argument list. ---- Implementation: Several variants are possible: * Add the list elements as keys to a hash-table, then extract the keys out of the array. Predicted performance O(nm). Removal of duplicates is done automatically. This is variant A. * Append the lists into a big list, then remove the duplicates. Removal of duplicates may use a sort followed by a simple linear scan, or a hash-table. Predicted runtimes are O(nm log nm) (sort+scan) and O(nm) (hash-table). n = length of longest lists, m = number of sets. These are variants B and C. * It is possible to use local namespace as implicit array, see Variant D in [SetOps, Create]. Predicted performance O(nm). This is Variant D. Remark: Some shortcuts are possible, f.e. if none or only one set is given to the procedure. ---- Timing information: * [SetOps, Union, Timing Script], * [SetOps, Union, Timing results, v7.6] * [SetOps, Union, Timing results, v8.0] Summary: * From fast to slow the order of variants is: A, B, C. * v7.6 does the tests faster than v8.0 * Test system: Linux 2.0.33, gcc 2.7.2.3, libc 5.4.38, P2 133, 32M main memory '''Update (1999 Mar 12):''' * Did the same test as for intersections, with much more datapoints. * In the long run the order of algorithms was '''D''', A, C, B, from best to worst. See below for the script and the raw data. * [SetOps, Union, Timing Script II] * [SetOps, Union, Results] * [SetOps, Union, Results, NEP Format] ---- Variant A. ====== proc ::setops::union {args} { switch [llength $args] { 0 { return {} } 1 { return [lindex $args 0] } default { foreach set $args { foreach e $set { set tmp($e) . } } return [array names tmp] } } } ====== ---- Variant B. ====== proc ::setops::union {args} { switch [llength $args] { 0 { return {} } 1 { return [lindex $args 0] } default { set tmp {} foreach set $args { foreach e $set { lappend tmp $e } } # remove duplicates -- # sort and scan. shortcut in case of empty or # single-element result. if {[llength $tmp] < 2} { return $tmp } set tmp [lsort $tmp] set last [lindex $tmp 0] set tmp [lrange $tmp 1 end] set res $last foreach e $tmp { if {[string compare $e $last] != 0} { lappend res $e set last $e } } return $res } } } ====== ---- Variant C. ====== proc ::setops::union {args} { switch [llength $args] { 0 { return {} } 1 { return [lindex $args 0] } default { set tmp {} foreach set $args { foreach e $set { lappend tmp $e } } # -W- remove duplicates -- # hash out. shortcut in case of empty or # single-element result. if {[llength $tmp] < 2} { return $tmp } foreach e $tmp { set tmpa($e) . } return [array names tmpa] } } } ====== ---- Variant D. ====== proc ::setops::union {args} { switch [llength $args] { 0 { return {} } 1 { return [lindex $args 0] } default { foreach set $args { if {[llength $set] > 0} { foreach $set {.} {break} } } unset args set info locals } } } ====== ---- -- AK <>Data Structure