The Kruskal-Wallis Test is a non-parametric test, which means that it does not assume the data coming from a distribution that can be completely described by two parameters i.e., mean and standard deviation (the way a normal distribution can).
The Kruskal-Wallis test is a better option, only if the assumption of (approximate) normality of observations cannot be met, or if one is analyzing an ordinal variable.
Like most non-parametric tests, we perform this on ranked data. We convert the measurement observations to their ranks in the overall data set wherein the smallest value gets rank 1, the next smallest gets rank 2, and so on.
Kruskal – Wallis test is the most commonly used non-parametric test, when the assumptions of one way anova fail. It is better when we have one nominal variable and one measurement variable.
The null hypothesis of the Kruskal–Wallis test indicates that the mean ranks of the groups are same. On the other hand, sometimes we see the null hypothesis of the Kruskal–Wallis test given as “The samples come from populations with the same distribution.” This becomes correct if the samples are coming from populations with the same distribution, and hence it will show no difference among them.
Doing Kruskal-Wallis test using SAS
To do a Kruskal–Wallis test in SAS, use the NPAR1WAY procedure. WILCOXON tells the procedure to only do the Kruskal–Wallis test, if we leave that out, we’ll get several other statistical tests as well, tempting us to pick the one whose results we like the best. The nominal variable that gives the group names is given with the CLASS parameter, while the measurement or ranked variable is given with the VAR parameter.
Here’s an example:
input group height;
proc NPAR1WAY data=height_diff WILCOXON;
The output contains a table of “Wilcoxon scores”.
The “mean score” is the mean rank in each group, which is what we’re testing the homogeneity of.
“Chi-square” is the H-statistic of the Kruskal–Wallis test, which is approximately chi-square distributed.
The “Pr > Chi-Square” is our P value.
Here the P – value is 0.3109, which is greater than the significance level, whether its 0.05 or 0.01. So there is no evidence to reject our null hypothesis that the samples came from populations with the same distribution.