Error processing request

Parameters

CONTENT_LENGTH0
REQUEST_METHODGET
REQUEST_URI/revision/Performance+and+simplification+of+code+-+implementing+the+basic+statistics+procedure?V=0
QUERY_STRINGV=0
CONTENT_TYPE
DOCUMENT_URI/revision/Performance+and+simplification+of+code+-+implementing+the+basic+statistics+procedure
DOCUMENT_ROOT/var/www/nikit/nikit/nginx/../docroot
SCGI1
SERVER_PROTOCOLHTTP/1.1
HTTPSon
REMOTE_ADDR172.70.175.61
REMOTE_PORT28574
SERVER_PORT4443
SERVER_NAMEwiki.tcl-lang.org
HTTP_HOSTwiki.tcl-lang.org
HTTP_CONNECTIONKeep-Alive
HTTP_ACCEPT_ENCODINGgzip
HTTP_X_FORWARDED_FOR35.168.110.128
HTTP_CF_RAY73a7b79ededb8299-IAD
HTTP_X_FORWARDED_PROTOhttps
HTTP_CF_VISITOR{"scheme":"https"}
HTTP_USER_AGENTCCBot/2.0 (https://commoncrawl.org/faq/)
HTTP_ACCEPTtext/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
HTTP_ACCEPT_LANGUAGEen-US,en;q=0.5
HTTP_CF_CONNECTING_IP35.168.110.128
HTTP_CF_IPCOUNTRYUS
HTTP_CDN_LOOPcloudflare

Body


Error

Unknow state transition: LINE -> END

-code

1

-level

0

-errorstack

INNER {returnImm {Unknow state transition: LINE -> END} {}} CALL {my render_wikit {Performance and simplification of code - implementing the basic statistics procedure} {
**Checking for a valid number**


[Arjen Markus] (20 july 2009) The math::statistics package in [Tcllib] contains a procedure to determine basic
statistical parameters, mean, maximum, standard deviation and so on. This procedure (like the rest of the
procedures) assumes that ''missing values'' are given by empty strings, so it contains constructs like:

======
if { $elem != {} } { 
    ... assume it is a valid number and do the computations ...
}
======

It would be nicer to first ''filter'' the data. That way the code can be much simpler. However,
that means extra work. How does this impact the performance? 

Another aspect of the code is the very check itself: it forces all elements in the list to acquire
a string representation. So, the first step is to investigate the merits of various ways to check
whether a variable holds a number or not.

The script below uses three methods:

   * Check if the value is "" - an empty string
   * Check if we can do ordinary arithmetic 
   * Check if the value can be interpreted as a double

=======
# notanumber.tcl --
#     Try three ways of establishing something is not a number
#
#     One way:
#     A missing value is represented by an empty string
#
#     Second way:
#     A missing value is simply not representable as a number,
#     so arithmetic will fail
#
#     Third way:
#     Can it be interpreted as a double value?

proc noFiltering {size threshold} {

    #
    # Create a list of numbers and empty strings
    #
    set dummy [expr {srand(10000)}]

    set list {}
    for {set i 0} {$i < $size} {incr i} {
        set value [expr {rand()}]
        if { $value < $threshold } {
            lappend list $value
        } else {
            lappend list {}
        }
    }
}

proc emptyString {size threshold} {

    #
    # Create a list of numbers and empty strings
    #
    set dummy [expr {srand(10000)}]

    set list {}
    for {set i 0} {$i < $size} {incr i} {
        set value [expr {rand()}]
        if { $value < $threshold } {
            lappend list $value
        } else {
            lappend list {}
        }
    }

    #
    # Note:
    # Implicit conversion of numbers to strings ...
    #
    set filtered {}
    foreach elem $list {
        if { $elem != {} } {
            lappend filtered $elem
        }
    }
}

proc failingArithmetic {size threshold} {

    #
    # Create a list of numbers and empty strings
    #
    set dummy [expr {srand(10000)}]

    set list {}
    for {set i 0} {$i < $size} {incr i} {
        set value [expr {rand()}]
        if { $value < $threshold } {
            lappend list $value
        } else {
            lappend list {}
        }
    }

    #
    # Note:
    # No conversion, but catch may be expensive
    #
    set filtered {}
    foreach elem $list {
        catch {
            if { $elem+0 == $elem } {
                lappend filtered $elem
            }
        }
    }
}

proc stringIsDouble {size threshold} {

    #
    # Create a list of numbers and empty strings
    #
    set dummy [expr {srand(10000)}]

    set list {}
    for {set i 0} {$i < $size} {incr i} {
        set value [expr {rand()}]
        if { $value < $threshold } {
            lappend list $value
        } else {
            lappend list {}
        }
    }

    #
    # Note:
    # No conversion, but catch may be expensive
    #
    set filtered {}
    foreach elem $list {
        catch {
            if { [string is double -strict $elem } {
                lappend filtered $elem
            }
        }
    }
}

# main --
#     Measure the performance
#

puts "Threshold: 1.1 - no missing elements"
set threshold 1.1
foreach size {100 1000 10000 100000} {
    noFiltering $size $threshold ;# Try to eliminate initial effects
    puts "Size: $size"
    puts "    No filtering: [time {noFiltering $size $threshold} 100]"
    puts "    Empty string: [time {emptyString $size $threshold} 100]"
    puts "    Arithmetic:   [time {failingArithmetic $size $threshold} 100]"
    puts "    IsDouble:     [time {stringIsDouble $size $threshold} 100]"
}

puts "Threshold: 0.9 - 10% missing elements"
set threshold 0.9
foreach size {100 1000 10000 100000} {
    noFiltering $size $threshold
    puts "Size: $size"
    puts "    No filtering: [time {noFiltering $size $threshold} 100]"
    puts "    Empty string: [time {emptyString $size $threshold} 100]"
    puts "    Arithmetic:   [time {failingArithmetic $size $threshold} 100]"
    puts "    IsDouble:     [time {stringIsDouble $size $threshold} 100]"
}
======


***Results***

The results for '''Tcl 8.4''' are:
======
Threshold: 1.1 - no missing elements
Size: 100
    No filtering: 48.55 microseconds per iteration
    Empty string: 270.8 microseconds per iteration
    Arithmetic:   97.16 microseconds per iteration
    IsDouble:     776.25 microseconds per iteration
Size: 1000
    No filtering: 488.72 microseconds per iteration
    Empty string: 2821.12 microseconds per iteration
    Arithmetic:   967.46 microseconds per iteration
    IsDouble:     8123.06 microseconds per iteration
Size: 10000
    No filtering: 5295.6 microseconds per iteration
    Empty string: 28864.72 microseconds per iteration
    Arithmetic:   10243.75 microseconds per iteration
    IsDouble:     78343.68 microseconds per iteration
Size: 100000
    No filtering: 52861.96 microseconds per iteration
    Empty string: 290535.2 microseconds per iteration
    Arithmetic:   103031.38 microseconds per iteration
    IsDouble:     787822.18 microseconds per iteration
Threshold: 0.9 - 10% missing elements
Size: 100
    No filtering: 49.97 microseconds per iteration
    Empty string: 252.07 microseconds per iteration
    Arithmetic:   140.11 microseconds per iteration
    IsDouble:     779.23 microseconds per iteration
Size: 1000
    No filtering: 490.99 microseconds per iteration
    Empty string: 2673.54 microseconds per iteration
    Arithmetic:   1415.15 microseconds per iteration
    IsDouble:     7788.95 microseconds per iteration
Size: 10000
    No filtering: 5181.49 microseconds per iteration
    Empty string: 26969.98 microseconds per iteration
    Arithmetic:   14954.56 microseconds per iteration
    IsDouble:     78017.08 microseconds per iteration
Size: 100000
    No filtering: 52016.61 microseconds per iteration
    Empty string: 268409.3 microseconds per iteration
    Arithmetic:   148799.84 microseconds per iteration
    IsDouble:     781611.65 microseconds per iteration
======

The results for '''Tcl 8.5''' are:
======
Threshold: 1.1 - no missing elements
Size: 100
    No filtering: 71.16 microseconds per iteration
    Empty string: 2027.37 microseconds per iteration
    Arithmetic:   147.0 microseconds per iteration
    IsDouble:     761.81 microseconds per iteration
Size: 1000
    No filtering: 698.12 microseconds per iteration
    Empty string: 20335.29 microseconds per iteration
    Arithmetic:   1474.66 microseconds per iteration
    IsDouble:     7698.57 microseconds per iteration
Size: 10000
    No filtering: 7146.59 microseconds per iteration
    Empty string: 202507.52 microseconds per iteration
    Arithmetic:   14508.37 microseconds per iteration
    IsDouble:     76518.43 microseconds per iteration
Size: 100000
    No filtering: 71542.95 microseconds per iteration
    Empty string: 2026748.83 microseconds per iteration
    Arithmetic:   147370.6 microseconds per iteration
    IsDouble:     776199.29 microseconds per iteration
Threshold: 0.9 - 10% missing elements
Size: 100
    No filtering: 72.4 microseconds per iteration
    Empty string: 1861.33 microseconds per iteration
    Arithmetic:   222.35 microseconds per iteration
    IsDouble:     756.02 microseconds per iteration
Size: 1000
    No filtering: 695.34 microseconds per iteration
    Empty string: 18853.98 microseconds per iteration
    Arithmetic:   2235.62 microseconds per iteration
    IsDouble:     7719.51 microseconds per iteration
Size: 10000
    No filtering: 7012.26 microseconds per iteration
    Empty string: 186777.61 microseconds per iteration
    Arithmetic:   25364.57 microseconds per iteration
    IsDouble:     78288.05 microseconds per iteration
Size: 100000
    No filtering: 74202.32 microseconds per iteration
    Empty string: 1891164.67 microseconds per iteration
    Arithmetic:   230277.45 microseconds per iteration
    IsDouble:     768754.55 microseconds per iteration
======

In both cases the arithmetic method is much faster than the other two, but the empty string method
is very slow in Tcl 8.5. 

[Donal Fellows] was able to boost the [[string is double]] method to give almost the same performance
as the arithmetic method.

----
<<categories>>Category Performance} regexp2} CALL {my render {Performance and simplification of code - implementing the basic statistics procedure} {
**Checking for a valid number**


[Arjen Markus] (20 july 2009) The math::statistics package in [Tcllib] contains a procedure to determine basic
statistical parameters, mean, maximum, standard deviation and so on. This procedure (like the rest of the
procedures) assumes that ''missing values'' are given by empty strings, so it contains constructs like:

======
if { $elem != {} } { 
    ... assume it is a valid number and do the computations ...
}
======

It would be nicer to first ''filter'' the data. That way the code can be much simpler. However,
that means extra work. How does this impact the performance? 

Another aspect of the code is the very check itself: it forces all elements in the list to acquire
a string representation. So, the first step is to investigate the merits of various ways to check
whether a variable holds a number or not.

The script below uses three methods:

   * Check if the value is "" - an empty string
   * Check if we can do ordinary arithmetic 
   * Check if the value can be interpreted as a double

=======
# notanumber.tcl --
#     Try three ways of establishing something is not a number
#
#     One way:
#     A missing value is represented by an empty string
#
#     Second way:
#     A missing value is simply not representable as a number,
#     so arithmetic will fail
#
#     Third way:
#     Can it be interpreted as a double value?

proc noFiltering {size threshold} {

    #
    # Create a list of numbers and empty strings
    #
    set dummy [expr {srand(10000)}]

    set list {}
    for {set i 0} {$i < $size} {incr i} {
        set value [expr {rand()}]
        if { $value < $threshold } {
            lappend list $value
        } else {
            lappend list {}
        }
    }
}

proc emptyString {size threshold} {

    #
    # Create a list of numbers and empty strings
    #
    set dummy [expr {srand(10000)}]

    set list {}
    for {set i 0} {$i < $size} {incr i} {
        set value [expr {rand()}]
        if { $value < $threshold } {
            lappend list $value
        } else {
            lappend list {}
        }
    }

    #
    # Note:
    # Implicit conversion of numbers to strings ...
    #
    set filtered {}
    foreach elem $list {
        if { $elem != {} } {
            lappend filtered $elem
        }
    }
}

proc failingArithmetic {size threshold} {

    #
    # Create a list of numbers and empty strings
    #
    set dummy [expr {srand(10000)}]

    set list {}
    for {set i 0} {$i < $size} {incr i} {
        set value [expr {rand()}]
        if { $value < $threshold } {
            lappend list $value
        } else {
            lappend list {}
        }
    }

    #
    # Note:
    # No conversion, but catch may be expensive
    #
    set filtered {}
    foreach elem $list {
        catch {
            if { $elem+0 == $elem } {
                lappend filtered $elem
            }
        }
    }
}

proc stringIsDouble {size threshold} {

    #
    # Create a list of numbers and empty strings
    #
    set dummy [expr {srand(10000)}]

    set list {}
    for {set i 0} {$i < $size} {incr i} {
        set value [expr {rand()}]
        if { $value < $threshold } {
            lappend list $value
        } else {
            lappend list {}
        }
    }

    #
    # Note:
    # No conversion, but catch may be expensive
    #
    set filtered {}
    foreach elem $list {
        catch {
            if { [string is double -strict $elem } {
                lappend filtered $elem
            }
        }
    }
}

# main --
#     Measure the performance
#

puts "Threshold: 1.1 - no missing elements"
set threshold 1.1
foreach size {100 1000 10000 100000} {
    noFiltering $size $threshold ;# Try to eliminate initial effects
    puts "Size: $size"
    puts "    No filtering: [time {noFiltering $size $threshold} 100]"
    puts "    Empty string: [time {emptyString $size $threshold} 100]"
    puts "    Arithmetic:   [time {failingArithmetic $size $threshold} 100]"
    puts "    IsDouble:     [time {stringIsDouble $size $threshold} 100]"
}

puts "Threshold: 0.9 - 10% missing elements"
set threshold 0.9
foreach size {100 1000 10000 100000} {
    noFiltering $size $threshold
    puts "Size: $size"
    puts "    No filtering: [time {noFiltering $size $threshold} 100]"
    puts "    Empty string: [time {emptyString $size $threshold} 100]"
    puts "    Arithmetic:   [time {failingArithmetic $size $threshold} 100]"
    puts "    IsDouble:     [time {stringIsDouble $size $threshold} 100]"
}
======


***Results***

The results for '''Tcl 8.4''' are:
======
Threshold: 1.1 - no missing elements
Size: 100
    No filtering: 48.55 microseconds per iteration
    Empty string: 270.8 microseconds per iteration
    Arithmetic:   97.16 microseconds per iteration
    IsDouble:     776.25 microseconds per iteration
Size: 1000
    No filtering: 488.72 microseconds per iteration
    Empty string: 2821.12 microseconds per iteration
    Arithmetic:   967.46 microseconds per iteration
    IsDouble:     8123.06 microseconds per iteration
Size: 10000
    No filtering: 5295.6 microseconds per iteration
    Empty string: 28864.72 microseconds per iteration
    Arithmetic:   10243.75 microseconds per iteration
    IsDouble:     78343.68 microseconds per iteration
Size: 100000
    No filtering: 52861.96 microseconds per iteration
    Empty string: 290535.2 microseconds per iteration
    Arithmetic:   103031.38 microseconds per iteration
    IsDouble:     787822.18 microseconds per iteration
Threshold: 0.9 - 10% missing elements
Size: 100
    No filtering: 49.97 microseconds per iteration
    Empty string: 252.07 microseconds per iteration
    Arithmetic:   140.11 microseconds per iteration
    IsDouble:     779.23 microseconds per iteration
Size: 1000
    No filtering: 490.99 microseconds per iteration
    Empty string: 2673.54 microseconds per iteration
    Arithmetic:   1415.15 microseconds per iteration
    IsDouble:     7788.95 microseconds per iteration
Size: 10000
    No filtering: 5181.49 microseconds per iteration
    Empty string: 26969.98 microseconds per iteration
    Arithmetic:   14954.56 microseconds per iteration
    IsDouble:     78017.08 microseconds per iteration
Size: 100000
    No filtering: 52016.61 microseconds per iteration
    Empty string: 268409.3 microseconds per iteration
    Arithmetic:   148799.84 microseconds per iteration
    IsDouble:     781611.65 microseconds per iteration
======

The results for '''Tcl 8.5''' are:
======
Threshold: 1.1 - no missing elements
Size: 100
    No filtering: 71.16 microseconds per iteration
    Empty string: 2027.37 microseconds per iteration
    Arithmetic:   147.0 microseconds per iteration
    IsDouble:     761.81 microseconds per iteration
Size: 1000
    No filtering: 698.12 microseconds per iteration
    Empty string: 20335.29 microseconds per iteration
    Arithmetic:   1474.66 microseconds per iteration
    IsDouble:     7698.57 microseconds per iteration
Size: 10000
    No filtering: 7146.59 microseconds per iteration
    Empty string: 202507.52 microseconds per iteration
    Arithmetic:   14508.37 microseconds per iteration
    IsDouble:     76518.43 microseconds per iteration
Size: 100000
    No filtering: 71542.95 microseconds per iteration
    Empty string: 2026748.83 microseconds per iteration
    Arithmetic:   147370.6 microseconds per iteration
    IsDouble:     776199.29 microseconds per iteration
Threshold: 0.9 - 10% missing elements
Size: 100
    No filtering: 72.4 microseconds per iteration
    Empty string: 1861.33 microseconds per iteration
    Arithmetic:   222.35 microseconds per iteration
    IsDouble:     756.02 microseconds per iteration
Size: 1000
    No filtering: 695.34 microseconds per iteration
    Empty string: 18853.98 microseconds per iteration
    Arithmetic:   2235.62 microseconds per iteration
    IsDouble:     7719.51 microseconds per iteration
Size: 10000
    No filtering: 7012.26 microseconds per iteration
    Empty string: 186777.61 microseconds per iteration
    Arithmetic:   25364.57 microseconds per iteration
    IsDouble:     78288.05 microseconds per iteration
Size: 100000
    No filtering: 74202.32 microseconds per iteration
    Empty string: 1891164.67 microseconds per iteration
    Arithmetic:   230277.45 microseconds per iteration
    IsDouble:     768754.55 microseconds per iteration
======

In both cases the arithmetic method is much faster than the other two, but the empty string method
is very slow in Tcl 8.5. 

[Donal Fellows] was able to boost the [[string is double]] method to give almost the same performance
as the arithmetic method.

----
<<categories>>Category Performance}} CALL {my revision {Performance and simplification of code - implementing the basic statistics procedure}} CALL {::oo::Obj8345261 process revision/Performance+and+simplification+of+code+-+implementing+the+basic+statistics+procedure} CALL {::oo::Obj8345259 process}

-errorcode

NONE

-errorinfo

Unknow state transition: LINE -> END
    while executing
"error $msg"
    (class "::Wiki" method "render_wikit" line 6)
    invoked from within
"my render_$default_markup $N $C $mkup_rendering_engine"
    (class "::Wiki" method "render" line 8)
    invoked from within
"my render $name $C"
    (class "::Wiki" method "revision" line 31)
    invoked from within
"my revision $page"
    (class "::Wiki" method "process" line 51)
    invoked from within
"$server process [string trim $uri /]"

-errorline

4