[NEM] 2008-06-10: Whilst implementing some sample [trie] code, I happened to use a naive implementation of a [queue], and was quite shocked at just how much of a performance penalty I had to pay: code that should have taken less than a second (even milliseconds) took almost a minute to complete! This inspired me to do some investigation of various implementations for simple data structures such as [stacks and queues]. Here I present a number of different implementations of a basic [stack] data structure and compare them for performance and memory usage. **Framework** First, here's a little module framework to hide away some of the boilerplate. You can skip this bit: a module is just a [namespace ensemble] and a method is just a proc with some sugar for [upvar]. ====== package require Tcl 8.5 # Simple modules proc module {name body} { namespace eval ::$name $body namespace eval ::$name { namespace ensemble create } } proc cons {name args} { uplevel 1 [list proc $name $args { info level 0 }] } # A replacement for proc that exports the given command from the module. # Also supports &foo syntax in parameters for automatically [upvar]ing # variable references. proc method {name params body} { set cmd "upvar 1" foreach p $params { set p [lindex $p 0] if {[string match &* $p]} { append cmd " \${$p} " [string range $p 1 end] } } set body "$cmd\n$body" uplevel 1 [list proc $name $params $body] uplevel 1 [list namespace export $name] } ====== **Implementations** ***Naive Stack Implementation*** First, here is the most basic stack implementation that is probably how most people would first try and implement a stack in Tcl. The stack is simply a list: pushing an element does an [lappend] and popping does an [lindex] to get the last element and then a [lrange] to truncate the stack list. As we will see, that [lrange] is an absolute killer for performance. ====== module SimpleStack { method init &s { set s [list] } method push {&s elem} { lappend s $elem } method pop &s { set r [lindex $s end] set s [lrange $s 0 end-1] return $r } method empty? &s { expr {[llength $s] == 0} } } ====== ''[Lars H], 2008-06-10: Well, there's little need to be quite so naive — a trivial [K]-combinator style rewrite of the above appears to restore linear complexity:'' ====== module SimpleStack2 { method init &s { set s [list] } method push {&s elem} { lappend s $elem } method pop &s { set r [lindex $s end] set s [lreplace $s [set s end] end] ; # Make sure [lreplace] operates on unshared object return $r } method empty? &s { expr {[llength $s] == 0} } } ====== [NEM]: Not just restores, but actually appears to be the fastest. I should have remembered [K], given that it came up in almost exactly the same kind of situation. Looking at the source of [Tcl_ListObjReplace] it seems in the unshared case where we are just deleting a single element from the end then this should be guaranteed to be an O(1) operation, bringing the "pop all" operation back indeed to O(N). ***Maintain an index into a list*** A natural adaptation of the above is to avoid the [lrange] by instead maintaining an index pointer into the stack list. The stack is grown as needed on push, but we never shrink it. Instead we just adjust an index that points to the top of the stack. ====== module IndexStack { # Store index as first element, rest are real elements method init &s { set s [list 0] } method push {&s elem} { set idx [lindex $s 0] if {[incr idx] >= [llength $s]} { lappend s $elem } else { lset s $idx $elem } lset s 0 $idx } method pop &s { set idx [lindex $s 0] set head [lindex $s $idx] incr idx -1 lset s 0 $idx return $head } method empty? &s { expr {[lindex $s 0] == 0} } } ====== ***Compacting Indexed Stack*** Here is a version that compacts the stack list whenever we are using less than half of it. This adds a bit of amortized overhead to each pop call, but can save quite a bit of space for stacks that vary in size greatly. ====== # A version of IndexStack that also compacts the stack if it grows small module CompactingIndexStack { method init &s { IndexStack init s } method push {&s elem} { IndexStack push s $elem } method pop &s { set r [IndexStack pop s] # Check to see if worth compacting set len [llength $s] if {$len > 10 && [lindex $s 0] < ($len/2.0)} { # Yes... set s [lrange $s 0 [lindex $s 0]] } return $r } method empty? &s { IndexStack empty? s } } ====== ***Nested List (cons-cell) Stack*** Another approach would be to use a linked list representation for the stack, as found in many [functional programming] languages. Here we present two approaches: one using nested lists as cons cells, and one using arrays. First, the nested-list representation: ====== # An implementation using nested lists as cons cells module ConsStack { cons Nil cons Pair a b method init &s { set s [Nil] } method push {&s elem} { set s [Pair $elem $s] } method pop &s { switch -exact [lindex $s 0] { Nil { error "empty" } Pair { set r [lindex $s 1] set s [lindex $s 2] return $r } } } method empty? &s { expr {[llength $s] == 1} } } ====== The implementation is almost as clear as the original simple version, but the performance is much better. You have to be careful though, as Tcl isn't great at handling deeply nested structures. For instance, changing the implementation of empty? to the following quickly blows the C stack causing a core-dump (generating the string rep of a deeply nested list is a bad idea, it seems). ====== method empty? &s { expr {$s eq [Nil]} } ====== ***Arrays as cons-cells*** The final version here is similar to the previous one, but here we use arrays as mutable cons cells. This is closer to a [Lisp] implementation, but is cumbersome to create in Tcl and needs some housekeeping code: ====== # An implementation using arrays as cons cells module ArConsStack { variable id 0 proc Nil {} { return "" } proc Pair {a b} { variable id set name ::ArConsStack::cell[incr id] upvar #0 $name cell set cell(h) $a set cell(t) $b return $name } proc car t { upvar #0 $t cell return $cell(h) } proc cdr t { upvar #0 $t cell return $cell(t) } method init &s { set s [Nil] } method push {&s elem} { set s [Pair $elem $s] } method pop &s { if {$s eq ""} { error "empty" } set r [car $s] set s2 [cdr $s] # Cleanup unset $s set s $s2 return $r } method empty? &s { expr {$s eq ""} } } ====== If you have any more clever implementations, feel free to add them here. **Benchmarks** We use the following simple test framework to both unit test and benchmark the implementations. It's pretty basic but gives an idea. ''— [Lars H]: I hacked '''test''' to allow varying the number of repetitions. For one thing, 100000 times took so long that the first time I ran it I thought it was stuck in an infinite loop. Also, you need to vary the number of repetitions to determine the complexity exponent.'' ====== proc benchmark {name body} { set time [lindex [uplevel 1 [list time $body]] 0] puts [format "%-10s : %10d us" $name $time] } proc assert {msg expr} { if {![uplevel 1 [list expr $expr]]} { error "ASSERT FAILED: $msg" } } proc test {stack {repetitions 100000}} { puts "\nTesting $stack $repetitions" # Basic unit/sanity checks $stack init s foreach x {a b c d e} { $stack push s $x } foreach x {e d c b a} { set y [$stack pop s] assert "$x == $y" {$y eq $x} } assert "stack empty" {[$stack empty? s]} unset s # Benchmark - create a big stack then pop every element at once $stack init s benchmark push { for {set i 0} {$i < $repetitions} {incr i} { $stack push s $i } } benchmark pop { while {![$stack empty? s]} { $stack pop s } } puts "Final size: [string bytelength $s] bytes" # Now test with push/pops interleaved unset s benchmark push/pop { $stack init s for {set i 0} {10*$i < $repetitions} {incr i} { for {set j 0} {$j < 10} {incr j} { $stack push s $j } for {set j 9} {$j >= 0} {incr j -1} { set h [$stack pop s] assert "$h == $j" {$h == $j} } } } puts "Final size: [string bytelength $s] bytes" } test SimpleStack test IndexStack test ConsStack test ArConsStack ====== Which gives the following results (2GHz Intel Core Duo MacBook): ====== Testing SimpleStack push : 200875 us pop : 70945579 us Final size: 0 bytes push/pop : 910583 us Final size: 0 bytes Testing SimpleStack2 push : 201996 us pop : 303392 us Final size: 0 bytes push/pop : 887688 us Final size: 0 bytes Testing IndexStack push : 245049 us pop : 272153 us Final size: 588891 bytes push/pop : 914710 us Final size: 21 bytes Testing CompactingIndexStack push : 428960 us pop : 694956 us Final size: 13 bytes push/pop : 1369530 us Final size: 11 bytes Testing ConsStack push : 346980 us pop : 464432 us Final size: 3 bytes push/pop : 1196018 us Final size: 3 bytes Testing ArConsStack push : 696860 us pop : 671755 us Final size: 0 bytes push/pop : 1669332 us Final size: 0 bytes ====== Putting everything into a table %| `Module` | push (µs) | pop (µs) | Final size #1 (bytes) | push/pop (µs) | Final size #2 (bytes) |% &| `SimpleStack` | 200875 | 70945579 | 0 | 910583 | 0 |& &| `SimpleStack2` | 201996 | 303392 | 0 | 887688 | 0 |& &| `IndexStack` | 245049 | 272153 | 588891 | 914710 | 21 |& &| `CompactingIndexStack` | 428960 | 694956 | 13 | 1369530 | 11 |& &| `ConsStack` | 346980 | 464432 | 3 | 1196018 | 3 |& &| `ArConsStack` | 696860 | 671755 | 0 | 1669332 | 0 |& As we can see, the performance of SimpleStack is utterly abismal, taking over 70 seconds to pop 100000 elements! However, for smallish stacks, it is perfectly adequate and performs about as well as the rest of the implementations. The IndexStack is a clear winner in terms of speed, but wastes quite a bit of memory if the stack ever grows large. A separate compacting stage which is called every now and again could deal with this (e.g. compact if less than half the list is being used), for a small amortized cost in performance. That is the solution of CompactingIndexStack. The overhead seems quite high compared to the linked list stack that follows it. The two cons-cell versions perform adequately. The ConsStack is a touch slower than the indexed version, while being able to efficiently shrink the stack as needed. The big problem though is the ease with which a heavily nested data structure can cause Tcl to crash! Be careful not to generate string reps you don't need! The ArConsStack performs adequately too, but suffers a bit of a performance overhead compared to lists [DKF]: The problem with `SimpleStack` is precisely its quadratic performance behaviour (which totally dominates anything relating to reference management). [NEM]: Indeed. My initial code that prompted this investigation was using the idiom: set xs [lassign $xs x] as a simple in-place queue, which also suffers from the same quadratic behaviour. Incidentally, the [tcllib] versions of both [stack] and [queue] suffer from this O(N²) implementation (using [lreplace] instead of [lrange]). [AK]: That will change. [AK]: I should further note that the tcllib packages have some methods like ''peek'' which are difficult to implement in some of the nicer representations. Incidentially, we have only benchmarks for sets and tree in the struct module. Would be good to put these up for stacks, and derive something for queues as well. ''([DKF]: I've reflected the table about a diagonal axis to make it easier to read.)'' ---- !!!!!! %| [Category Data Structure] | [Category Performance] |% !!!!!!