Kruskal-Wallis test

The Kruskal-Wallis test is a non-parametric one-way analysis of variance by ranks (named after William Kruskal and W. Allen Wallis) for testing equality of population medians among groups (see Wikipedia for more information).

This test is missing from the tcllib math package, so here is an implementation. It can easily be taken apart to use the ranking of groups of values as a separate command, since this is needed separately from the test in other occasions. My implementation takes a list of groups where each group is a list of values. It returns a list with the H value (the test result) and the p value (i.e., the probability of a H value this large or larger). The latter is computed using ::math::statistics::cdf-chisquare, so the tcllib math::statistics package is needed.

Here is code:

package require Tcl 8.4
package require math::statistics

proc kw-h {args} {
        set index 0
        set rankList [list]
        set setCount [llength $args]
        # read lists of values:
        foreach item $args {
                set values($index) [lindex $args $index]
                # prepare ranking with rank=0:
                foreach value $values($index) {lappend rankList [list $index $value 0]}
                incr index 1
        }
        # sort the values:
        set rankList [lsort -real -index 1 $rankList]
        # assign the ranks (disregarding ties):
        set length [llength $rankList]
        for {set i 0} {$i < $length} {incr i} {
                lset rankList $i 2 [expr {$i + 1}]
        }
        # value of the previous list element:
        set prevValue {}
        # list of indices of list elements having the same value (ties):
        set equalIndex [list]
        # test for ties and re-assign mean ranks for tied values:
        for {set i 0} {$i < $length} {incr i} {
                set value [lindex $rankList $i 1]
                if {($value != $prevValue) && ($i > 0) && ([llength $equalIndex] > 0)} {
                        # we are still missing the first tied value:
                        set j [lindex $equalIndex 0]
                        incr j -1
                        set equalIndex [linsert $equalIndex 0 $j]
                        # re-assign rank as mean rank of tied values:
                        set firstRank [lindex $rankList [lindex $equalIndex 0] 2]
                        set lastRank  [lindex $rankList [lindex $equalIndex end] 2]
                        set newRank [expr {($firstRank+$lastRank)/2.0}]
                        foreach j $equalIndex {lset rankList $j 2 $newRank}
                        # clear list of equal elements:
                        set equalIndex [list]
                } elseif {$value == $prevValue} {
                        # remember index of equal value element:
                        lappend equalIndex $i
                }
                set prevValue $value
        }
        # re-establish original sets of values, but using the ranks:
        foreach item $rankList {
                lappend rankValues([lindex $item 0]) [lindex $item 2]
        }
        # now compute H:
        set H 0
        for {set i 0} {$i < $setCount} {incr i} {
                set total [expr [join $rankValues($i) +]]
                set count [llength $rankValues($i)]
                set H [expr {$H + pow($total,2)/double($count)}]
        }
        set H [expr {$H*(12.0/($length*($length + 1))) - (3*($length + 1))}]
        incr setCount -1
        set p [expr {1 - [::math::statistics::cdf-chisquare $setCount $H]}]
        return [list $H $p]
}

Now, an example (the same as on this page: [L1 ])

% puts [kw-h {6.4 6.8 7.2 8.3 8.4 9.1 9.4 9.7} {2.5 3.7 4.9 5.4 5.9 8.1 8.2} {1.3 4.1 4.9 5.2 5.5 8.2}]
9.83627087199 0.00731275323967

See also:


arjen - 2010-09-22 10:07:53

Incorporated in Tcllib: math::statistics, version 0.7.0