Implementing FIFO queues

Originally posted to comp.lang.tcl, 6 Aug 2003 <[email protected]>

Someone asked:

I want to implement a circular queue in Tcl where the setter function writes to the bottom of the queue and the getter reads them from the top of the queue.

As previously mentioned, tcllib has a FIFO queue package that you could use, but this is an interesting exercise in Tcl data structure design so it's worthwhile looking at how to build one from scratch.

The usual way to implement a FIFO queue in "traditional" languages is with a circular linked list or with a regular linked list and an extra "tail" pointer. This approach doesn't translate into Tcl very well; pointers and anonymous dynamically allocated nodes aren't a good fit with Tcl where "everything is a string".

But Tcl does have a number of other useful building blocks: associative arrays and lists are the main ones. Let's see what we can build out of these.

The simplest approach is simply to represent the queue as a list:

        proc qinit {qvar} {
            upvar 1 $qvar Q
            set Q [list]
        }
        proc qput {qvar elem} {
            upvar 1 $qvar Q
            lappend Q $elem
        }
        proc qget {qvar} {
            upvar 1 $qvar Q
            set head [lindex $Q 0]
            set Q [lrange $Q 1 end]
            return $head
        }
        proc qempty {qvar} {
            upvar 1 $qvar Q
            return [expr {[llength $Q] == 0}]
        }

The above implementation of qput is efficient, since lappend has amortized O(1) runtime, but qget is problematic. lrange $Q 1 end is O(N), so in the worst case an algorithm using this queue implementation would take O(N^2) time.

We can improve on this by borrowing an idea from Hood-Melville queues: split the queue into two pieces, "L" and "R". qput adds elements to R, and qget takes them from L; if L is empty, move the contents of R onto L. Instead of actually removing elements from L, we just keep track of the index of the next item; the left half of the queue is empty when this index reaches the end of the list.

        proc qinit {qvar} {
            upvar 1 $qvar Q
            set Q(i) 0
            set Q(l) [list]
            set Q(r) [list]
        }

        proc qput {qvar elem} {
            upvar 1 $qvar Q
            lappend Q(r) $elem
        }

        proc qget {qvar} {
            upvar 1 $qvar Q
            if {$Q(i) >= [llength $Q(l)]} {
                set Q(l) $Q(r)
                set Q(r) [list]
                set Q(i) 0
            }
            set head [lindex $Q(l) $Q(i)]
            incr Q(i)
            return $head
        }

        proc qempty {qvar} {
            upvar 1 $qvar Q
            return [expr {$Q(i) >= [llength $Q(l)] && [llength $Q(r)] == 0}]
        }

Now qput and qget both run in O(1) time, and the space usage is at most a constant factor more than the naive implementation using a single list.

--Joe English

KPV 2003-08-06: Having left and right queues seems overly complicated to me. Why not just have a head pointer that points to the current head of the queue. The head would get initialized to 0 and the queue is empty when the head is equal to the length of the queue. qput is still just an lappend and qget is just a lindex and an increment.

JE That also has good time complexity, but the space complexity is worse since elements taken from the head of the queue are never freed. Worst-case, the size of the queue grows without bound.

I just recently wrote such a beast when I needed to do a shortest-path search via a breadth first search. One key requirement for the BFS is that queue must not be destroyed--it is needed for walking back along the shortest path.

Here's code that implements this idea:

 proc q'init {qvar} {
    upvar 1 $qvar Q
    set Q(q) [list]
    set Q(h) 0
 }
 proc q'put {qvar elem} {
    upvar 1 $qvar Q
    lappend Q(q) $elem
 }
 proc q'get {qvar} {
    upvar 1 $qvar Q
    set head [lindex $Q(q) $Q(h)]
    incr Q(h)
    return $head
 }
 proc q'empty {qvar} {
    upvar 1 $qvar Q
    return [expr {[llength $Q(q)] == $Q(h)}]
 }

AMG: Here's a queue implementation that uses [namespace ensemble]. Also it tries to give [lrange] an unshared queue object so it can work in-place and avoid unnecessary copying. It's a bit more flexible in that it allows specifying an initial value for the queue, and any number of elements can be enqueued at once.

namespace eval queue {
    namespace ensemble create -subcommands {create put get empty} 
    proc create {queueVar args} {
        upvar 1 $queueVar queue
        set queue $args
    }
    proc put {queueVar args} {
        upvar 1 $queueVar queue
        lappend queue {*}$args
    }
    proc get {queueVar} {
        upvar 1 $queueVar queue
        set head [lindex $queue 0]
        set queue [lrange $queue[set queue ""] 1 end]
        return $head
    }
    proc empty {queueVar} {
        upvar 1 $queueVar queue
        expr {![llength $queue]}
    }
}

Example usage:

% queue create foo a b c
a b c
% queue empty foo
0
% queue put foo d e f
a b c d e f
% while {![queue empty foo]} {puts [queue get foo]}
a
b
c
d
e
f
% queue empty foo
1

Performance testing:

% set q [lrepeat 1000000 x]; time {queue get q} 1000
 2759.502 microseconds per iteration
% set q [lrepeat 1000000 x]; time {qget q} 1000
32072.98 microseconds per iteration
% set q [lrepeat 1000 x]; time {queue get q} 1000
    8.483 microseconds per iteration
% set q [lrepeat 1000 x]; list; time {qget q} 1000
   17.357 microseconds per iteration

I tried using [lassign] instead of [lrange], but it was far slower. I think it was making a copy despite being passed an unshared object. I also tried [lreplace] instead of [lrange], but the two had identical performance.

Using $queue[set queue ""] is a clear performance win, but it still takes substantially more time to get an element from a long queue than a short one. This is because of the memory move used to delete the first element. Changing the code to reverse the queue order fixes this problem:

proc qget {qvar} {
    upvar 1 $qvar Q
    set head [lindex $Q end]
    set Q [lrange $Q 0 end-1]
    return $head
}
proc ::queue::get queueVar {
    upvar 1 $queueVar queue
    set head [lindex $queue end]
    set queue [lrange $queue[set queue ""] 0 end-1]
    return $head
}

% set q [lrepeat 1000000 x]; time {queue get q} 1000
    7.425 microseconds per iteration
% set q [lrepeat 1000000 x]; time {qget q} 1000
32813.389 microseconds per iteration
% set q [lrepeat 1000 x]; time {queue get q} 1000
    7.597 microseconds per iteration
% set q [lrepeat 1000 x]; time {qget q} 1000
   16.465 microseconds per iteration

Now the queue length has no measurable impact on the timing of [queue get]. It still affects [qget] majorly, since it copies.

Of course, fixing the performance of [queue get] simply moved the problem to [queue put]! Only one of the two can have O(1) time, at least when using a simple linear array as backing store. Linked lists allow for constant time enqueuing and dequeuing, but locality of reference suffers.

$var[set var ""] may have tremendous performance benefits, but it's clumsy and counterintuitive. I suggest this alternate formulation:

proc take {varName} {
    upvar 1 $varName var
    return $var[set var ""]
}
# example:
set queue [lrange [take queue] 0 end-1]

although it really should be bytecoded. See Bytecoded K for code that can be adapted.

EPSJ: I would suggest using arrays for queues. The access (put/get) is a little slower, but it is always O(1). See example bellow:

namespace eval queue {
    namespace ensemble create -subcommands {create put get empty size} 
    proc create {queueVar args} {
        upvar 1 $queueVar queue
        
        set queue(in_idx) 0
        set queue(out_idx) 0
        if {[llength $args]} {
            put queue $args
        }
    }
    proc put {queueVar args} {
        upvar 1 $queueVar queue
        set queue($queue(in_idx)) $args
        if { [incr queue(in_idx)]==2147483647} { set queue(in_idx) 0}
        return
    }
    proc get {queueVar} {
        upvar 1 $queueVar queue
        set result {}
        if {($queue(out_idx)!=$queue(in_idx))} {
            set result $queue($queue(out_idx))
            unset queue($queue(out_idx))
            if { [incr queue(out_idx)]==2147483647} { set queue(out_idx) 0}
        }
        return $result
    }
    proc empty {queueVar} {
        upvar 1 $queueVar queue
        expr {$queue(out_idx)==$queue(in_idx)}
    }
    proc size {queueVar} {
        upvar 1 $queueVar queue
        set result [expr {$queue(in_idx)-$queue(out_idx)}]
        expr {($result<0) ? 2147483647 + $result : $result}
    }
}

% queue create q
% time {queue put q x} 1000000
1.602769 microseconds per iteration
% time {queue get q} 1000000
1.604104 microseconds per iteration

NEM 2018-10-27 Actually you can have a list-based queue that exhibits O(1) (amortized) time complexity for both push and pop operations, by using a trick from Purely Functional Data Structures. The key is to realise that both push and pop are O(1) when they operate on the end of a list, rather than at the beginning. Inserting or removing from the beginning requires copying the entire rest of the list, while inserting or removing from the end (in Tcl's implementation) usually only requires adjusting some length counters. In most functional languages, which favour linked lists, the reverse is true -- inserting and removing from the head of the list is quick, while doing so from the end is O(n).

The particular trick we will use here is to use two lists to represent the queue, one in normal order and one that is reversed. This trick has been known in functional programming lore since at least 1980 . When pushing a new element, we append it to the normal list. When popping an element we take it from the end of the reversed list. Therefore we are always operating on the end of a list, and get nice O(1) performance. If the reversed queue is empty when we attempt to pop, then we take the normal list, reverse it, and use that - setting the normal list to empty. This is an O(n) operation, but as we only have to do it every n operations, we end up with overall O(1) amortized cost. If we call the normal queue in and the reversed queue out then we have the invariant that the full queue is always represented by:

list {*}[lreverse $out] {*}$in

The downside is that some pop operations do take O(n) time, which can be a problem if you care about worst-case latency. But in many cases this is acceptable, and it is of course much better than every pop operation taking O(n) time. (Note that exactly the same occurs when appending to a Tcl list: some operations will take longer as Tcl has to reallocate memory to extend the length of the list, and the same amortized analysis applies).

The code is quite simple, apart from the usual tricks to play with reference counts:

namespace eval queue {
    namespace export create empty put get size tolist
    namespace ensemble create

    proc create args {
        dict create out [list] in $args
    }

    proc put {queueVar args} {
        upvar 1 $queueVar queue
        dict lappend queue in {*}$args
    }

    proc get {queueVar} {
        upvar 1 $queueVar queue
        if {[llength [dict get $queue out]] == 0} {
            dict set queue out [lreverse [dict get $queue in]]
            dict set queue in [list]
        }
        set out [dict get $queue out]
        dict set queue out [list]
        set ret [lindex $out end]
        dict set queue out [lreplace $out [set out end] end]
        return $ret
    }

    proc size queue {
        expr {[llength [dict get $queue in]] + [llength [dict get $queue out]]}
    }

    proc empty queue { expr {[size $queue] == 0} }

    proc tolist queue { 
        list {*}[lreverse [dict get $queue out]] {*}[dict get $queue in]
    }
}

Some example timings:

% set q [queue create {*}[lrepeat 1000000 x]]; time { queue get q } 1000
4.91954 microseconds per iteration
% set q [queue create]; time { queue put q x } 1000000
1.137679108 microseconds per iteration
% set q [queue create {*}[lrepeat 1000000 x]]; time { queue get q } 1000000
1.125611442 microseconds per iteration