Version 2 of Kruskal-Wallis test

Updated 2010-09-21 02:46:31 by AKgnome

The Kruskal-Wallis test is a non-parametric one-way analysis of variance by ranks (named after William Kruskal and W. Allen Wallis) for testing equality of population medians among groups (see [L1 ] for more information).

This test is missing from the tcllib math package (here: ::math::statistics), so here is an implementation. It can easily be taken apart to use the ranking of groups of values as a separate command, since this is needed separately from the test in other occasions. My implementation takes a list of groups where each group is a list of values. It returns a list with the H value (the test result) and the p value (i.e. the probability of a H value this large or larger). The latter is computed using ::math::statistics::cdf-chisquare, so the tcllib math::statistics package is needed.

Here is code:

package require Tcl 8.4
package require math::statistics

proc kw-h {args} {
        set index 0
        set rankList [list]
        set setCount [llength $args]
        # read lists of values:
        foreach item $args {
                set values($index) [lindex $args $index]
                # prepare ranking with rank=0:
                foreach value $values($index) {lappend rankList [list $index $value 0]}
                incr index 1
        }
        # sort the values:
        set rankList [lsort -real -index 1 $rankList]
        # assign the ranks (disregarding ties):
        set length [llength $rankList]
        for {set i 0} {$i < $length} {incr i} {
                lset rankList $i 2 [expr {$i + 1}]
        }
        # value of the previous list element:
        set prevValue {}
        # list of indices of list elements having the same value (ties):
        set equalIndex [list]
        # test for ties and re-assign mean ranks for tied values:
        for {set i 0} {$i < $length} {incr i} {
                set value [lindex $rankList $i 1]
                if {($value != $prevValue) && ($i > 0) && ([llength $equalIndex] > 0)} {
                        # we are still missing the first tied value:
                        set j [lindex $equalIndex 0]
                        incr j -1
                        set equalIndex [linsert $equalIndex 0 $j]
                        # re-assign rank as mean rank of tied values:
                        set firstRank [lindex $rankList [lindex $equalIndex 0] 2]
                        set lastRank  [lindex $rankList [lindex $equalIndex end] 2]
                        set newRank [expr {($firstRank+$lastRank)/2.0}]
                        foreach j $equalIndex {lset rankList $j 2 $newRank}
                        # clear list of equal elements:
                        set equalIndex [list]
                } elseif {$value == $prevValue} {
                        # remember index of equal value element:
                        lappend equalIndex $i
                }
                set prevValue $value
        }
        # re-establish original sets of values, but using the ranks:
        foreach item $rankList {
                lappend rankValues([lindex $item 0]) [lindex $item 2]
        }
        # now compute H:
        set H 0
        for {set i 0} {$i < $setCount} {incr i} {
                set total [expr [join $rankValues($i) +]]
                set count [llength $rankValues($i)]
                set H [expr {$H + pow($total,2)/double($count)}]
        }
        set H [expr {$H*(12.0/($length*($length + 1))) - (3*($length + 1))}]
        incr setCount -1
        set p [expr {1 - [::math::statistics::cdf-chisquare $setCount $H]}]
        return [list $H $p]
}

Now, an example (the same as on this page: [L2 ])

% puts [kw-h {6.4 6.8 7.2 8.3 8.4 9.1 9.4 9.7} {2.5 3.7 4.9 5.4 5.9 8.1 8.2} {1.3 4.1 4.9 5.2 5.5 8.2}]
9.83627087199 0.00731275323967

See also: