Purpose: Definition and discussion of set functionality, especially the function to compute the intersection of two or more sets.
Back to the Chart of proposed set functionality.
Arguments:
Result:
Implementation:
I assume that intersection of multiple sets is finally reduced to intersection of two sets. Several variants to implement this are possible:
Possible shortcuts:
Timing
Summary:
Variant A.
proc ::setops::Intersect2 {a b} { if {[llength $a] == 0} { return {} } if {[llength $b] == 0} { return {} } set res {} set a [lsort $a] set b [lsort $b] while {1} { # Store lindex/0,1 in var, access later faster ? set n [string compare [lindex $a 0] [lindex $b 0]] if {$n == 0} { # A = B => element in intersection lappend res [lindex $a 0] set a [lrange $a 1 end] set b [lrange $b 1 end] } elseif {$n > 0} { # A > B, remove B, we are beyond the element. set b [lrange $b 1 end] } else { # A < B, remove A, we are beyond the element. set a [lrange $a 1 end] } if {[llength $a] == 0} { return $res } if {[llength $b] == 0} { return $res } } return $res }
Variant B.
proc ::setops::Intersect2 {a b} { if {[llength $a] == 0} { return {} } if {[llength $b] == 0} { return {} } set res {} foreach e $a { set aa($e) . } foreach e $b { set ba($e) . } foreach e $a { if {[info exists aa($e)] && [info exists ba($e)]} { lappend res $e } } foreach e $b { if {[info exists aa($e)] && [info exists ba($e)]} { lappend res $e } } return $res }
MS: this algorithm returns a list where all elements appear twice.
Variant C.
proc ::setops::Intersect2 {a b} { if {[llength $a] == 0} { return {} } if {[llength $b] == 0} { return {} } set res {} if {[llength $a] < [llength $b]} { foreach e $b { set check($e) . } foreach e $a { if {[info exists check($e)]} { lappend res $e } } } else { foreach e $a { set check($e) . } foreach e $b { if {[info exists check($e)]} { lappend res $e } } } return $res }
Variant D.
proc ::setops::Intersect2 {a b} { if {[llength $a] == 0} { return {} } if {[llength $b] == 0} { return {} } set res {} if {[llength $a] < [llength $b]} { foreach $b {.} {break} foreach e $a { if {[info exists $e]} { lappend res $e } } } else { foreach $a {.} {break} foreach e $b { if {[info exists $e]} { lappend res $e } } } return $res }
MS: this algorithm will fail if $a contains an element 'b', as the first pass removes the list.
This algorithm pollutes the namespace of the proc with variable names taken from the list. It may unexpectedly fail when the list contains values that happen to be the variable names chosen above. --BvW
-- AK
I found this solution to be faster with optimisations in lsearch for -sorted
proc lintersect { a b } { set a [lsort $a] set result {} foreach element $b { if { [lsearch -sorted -increasing $a $element] != -1 } { lappend result $element } } return $result }
--Bvw