Assumptions
What do various non-parametric tests assume?
Test |
Type of test |
Assumptions |
|
Sign test |
Single sample |
- The underlying data are continuous. - The data are independent. |
|
Wilcoxon signed-rank test |
Single sample |
- The underlying data are symmetric. - The underlying data are continuous. - The data are independent. |
|
Paired sign test |
Two sample |
- The data are in matched pairs. - The differences between matched pairs are continous. - The data are independent. |
|
Wilcoxon matched-pairs signed-rank test |
Two sample |
- The data are in matched pairs. - The differences between matched pairs are symmetrical. - The differences between matched pairs are continous. - The data are independent. |
|
Wilcoxon rank-sum test |
Two sample |
- The two samples are independent. - The underlying data are symmetric. - The underlying data are continuous. |
|
Single-sample sign test
To perform the single-sample sign test,
(i) mark the values that are greater than the stated median with a $+$ sign, and
(ii) mark those that are less than the stated median with a $-$ sign.
If the data are well distributed about the median, we would expect an equal number of $+$ and $-$ signs.
Hence, there should be
(i) a probability of $\frac{1}{2}$ that any data point is above the median;
(ii) a probability of $\frac{1}{2}$ that any data point is below the median.
Given $n$ data points, a single-sample sign test is created using $X \sim \mathrm{Bin}(n, 0.5)$. The test statistic can be the number $+$ signs, that is the number of data points greater than the median.
We can calculate the probability that $X$ is
(i) above this test statistic,
(ii) below this test statistic, or
(iii) either (in the case of a two-tailed test)
which is $\mathbb{P}(X \le T | X \sim \mathrm{Bin}(n,0.5))$, where $T$ is the test statistic.
In a situation where we have $0$ instead of $+$ or $-$, the data point is discounted.
It is possible to approximate the sign test to a normal distribution for large $n$ ($n > 10$ can be considered large).
Let $T = \min(S^+, S^-)$, then $E(T) = np = n/2$, and $\mathrm{Var}(T) = npq = n/4.$
For large $n$,
\(X \sim N\left(\frac{n}{2},\frac{n}{4}\right),\)
we can use the normal approximation of the binomial with $p = 0.5$. We must also make sure that we use a continuity correction.
In general, at a significance level $\alpha$, where $0< \alpha < 1$,
(i) for $H_1: m < m_0$, if $P(X \le S^+) \le \alpha \implies$ reject $H_0$
(ii) for $H_1: m > m_0$, if $P(X \ge S^+) \le \alpha \implies$ reject $H_0$
(iii) for $H_1: m \neq m_0$, if $P(X \le T) \le \alpha/2 \implies$ reject $H_0$