Simulating a server system

Arjen Markus (21 august 2002) Simulating discrete events can be useful in a variety of situations, one of them being the analysis of the performance of a server system. Suppose that you have a server system and this receives a certain amount of requests per minute (on average of course). Then important parameters could be:

The mean time a request has to wait
The mean length of the request queue
The maximum time a request has to wait
The maximum length of the request queue

For any but the simplest systems (requests that always take the same amount of time to be handled, for instance) a rigourous, mathematically exact, analysis will be difficult or impossible. Suppose that there are two types of requests, one requiring the full attention of the system, so that no other requests can be handled at the same time. An exact analysis would be difficult to say the least.

The method presented below in the form of a Tcl script is simple and straightforward, but could be the basis of a more elaborate model of the server system of your choice.

It assumes:

Events are received at a statistically constant rate of k events per unit of time.
The events arrive indepedently
All events require the same time T to be handled
The number of events that can be handled during one unit of time is constant, N, reflecting the number of threads or child processes taking care of the actual job.

The first two assumptions lead to the so-called Poisson distribution:

   P(x=a) = (1/k) exp(-a/k)

where P(x=a) is the chance that "a" events arrive in the given time interval.

The uniform distribution returned by the rand() function of [expr] can be turned into such a Poisson distribution by the following formula:

   a = int( k * ln(1-r) )

where r is a random number between 0 an 1.

The output parameters are:

Total number of events received during the simulation
Mean time for an event to be handled
Mean length of the queue

Per time interval:

A certain number of events are received and added to the queue. The parameters associated with each event are the time of arrival and the time still needed for processing.
The first N in the queue are handled (in the order of storage), that is the time still needed is reduced by 1.
If an event has been processed completely, the statistics are calculated.

The first part is to check that the (rather naive implementation) of the Poisson distribution is okay:

 set ::poisson_mean 3.0    ;# Not too many events per unit of time

 proc poisson {} {
    expr {int(-$::poisson_mean*log(1.0-rand()))}
 }

 proc checkPoisson {mean} {
    set ::poisson_mean $mean
    set sum  0.0
    set sum2 0.0
    set nosteps 100000
    for { set i 0 } { $i < $nosteps } { incr i } {
       set p    [poisson]
       set sum  [expr {$sum+$p}]
       set sum2 [expr {$sum+$p*$p}]
    }
    puts "$mean\t[expr {$sum/$nosteps}]\t[expr {$sum2/$nosteps}]"
 }

 foreach mean {0.1 0.3 1.0 3.0 10.0} {
    checkPoisson $mean
 }

The results are not very encouraging - as expected - but that is not important for the moment:

   Given mean    Mean       Mean of squares
   0.1           6e-005     6e-005
   0.3           0.03688    0.03688
   1.0           0.58732    0.58732
   3.0           2.52629    2.52633
   10.0          9.45861    9.45861

Ideally, the measured mean and mean of squares should be equal to the given parameter.

From this we learn two things:

A more sophisticated approximation of the Poisson distribution is required (see below!)
We need to use values above 1 to make it somewhat interesting

Well, now for the modelling itself:

 # Set the global variables:
 # - the statistical results and state variables
 # - the model parameters
 #
 proc initialise {} {
    set ::nosteps     10000 ;# Number of intervals to simulate
    set ::noevents        0 ;# Count of number of events received
    set ::duration        0 ;# Total/mean life time of events
    set ::queue_length    0 ;# Total/mean queue length
    set ::nohandled       0 ;# Number of events handled

    set ::queue          {} ;# Queue with current events
    set ::time            0 ;# Time in simulation

    # Model parameters
    set ::process_cap  10 ;# Number of events treated per interval
    set ::process_time  5 ;# Process time
    set ::poisson_mean  3 ;# Mean number of events
 }

 proc report {} {
    puts "Total number of events:-  $::noevents"
    puts "Mean life time of events: [expr {$::duration/$::nohandled}]"
    puts "Mean queue length:-       [expr {$::queue_length/$::nosteps}]"
 }

 proc getEvents {} {
    set noevs [poisson]
    for { set i 0 } { $i < $noevs } { incr i } {
       lappend ::queue [list $::time $::process_time]
    }
    incr ::queue_length [llength $::queue]
    incr ::noevents $noevs
 }

 proc processEvents {} {
    set first_events     [lrange $::queue 0 [expr {$::process_cap-1}]]
    set remaining_events [lrange $::queue $::process_cap end]

    set ::queue {}
    foreach event $first_events {
       foreach {arrival_event process_event} $event {break}
       incr process_event -1
       if { $process_event <= 0 } {
          set span [expr {$::time-$arrival_event+1}]
          incr ::duration $span
          incr ::nohandled
       } else {
          lappend ::queue [list $arrival_event $process_event]
       }
    }
    set ::queue [concat $::queue $remaining_events]
 }

The above procedures can be used to simulate a simple server system, with a variety of process parameters:

 #
 # Vary the processing capacity
 #
 foreach cap {20 15 10 5} {
    puts "Capacity: $cap"
    initialise
    set ::process_cap $cap

    while { $::time < $::nosteps } {
       getEvents
       processEvents
       incr ::time
    }
    report
    puts ""
 }

The result is somewhat astounding:

 Capacity: 20
 Total number of events:   25629
 Mean life time of events: 5
 Mean queue length:        14

 Capacity: 15
 Total number of events:   25505
 Mean life time of events: 8
 Mean queue length:        20

 Capacity: 10
 Total number of events:   25910
 Mean life time of events: 1129
 Mean queue length:        2911

 Capacity: 5
 Total number of events:   25933
 Mean life time of events: 3044
 Mean queue length:        7891

At the transition from a capacity of 15 to 10, the server system as modelled acquires a totally different behaviour, that is, it is no longer capable of dealing with the flood of events!

Was this predictable? Yes, at least that some transition occurs. We can estimate the breakpoint by considering that:

On average 2.5 events per unit of time are generated
Each event takes 5 units of time to be processed
So unless at least 2.5*5 = 12.5 events per unit of time are handled, you get a backlog. Eventually the queue of events will grow unbounded!

More detailed analyses (especially numerical) are possible:

What happens to the average time when you get closer to the breakpoint?
What is the (expected) maximum queue length (if not all events are stored, you will loose some!)
Etc.

Note: Since the presented results are a single realisation of a stochastic process, one should do a fair number of these simulations to get dependable results.

On the Poisson distribution:

The site netlib.org [L1 ] has a wealth of numerical software. It contains among many other libraries several accurate algorithms for generating random numbers acoording to the Poisson distribution (and most other well-known distributions as well).

Category Simulator	Category Concurrency