There are sometimes situations where we may have multiple independent data samples. We can perform the Student’s t-test pairwise on each combination of the data samples to get an idea of which samples have different means. This can be onerous if we are only interested in whether all samples have the same distribution or not. To answer this question, we can use the analysis of variance test, or ANOVA for short.
ANOVA is a statistical test that assumes that the mean across 2 or more groups are equal. If the evidence suggests that this is not the case, the null hypothesis is rejected and at least one data sample has a different distribution.
The test requires that the data samples are a Gaussian distribution, that the samples are independent, and that all data samples have the same standard deviation.
Comparison | H0 | Ha |
---|---|---|
t-Test | μ1 = μ2 | μ1 ≠ μ2 |
ANOVA | μ1 = μ2 = ... = μk | μ1,μ2,μ3...μk not all equal |
One way ANOVA
For one way ANOVA, we have one dependent variable and one independent variable. In order to measure the relationship between the dependent variable and the independent variable,we follow the formula below:
1 | DV(delta) = IV(delta explanable)/other(delta inexplanable) |
Example:
Suppose we have three different teaching methods along with the math result under each teaching method:
A | B | C |
---|---|---|
77 | 74 | 93 |
88 | 88 | 94 |
77 | 77 | 95 |
85 | 93 | 83 |
81 | 91 | 94 |
72 | 95 | 94 |
80 | 85 | 85 |
80 | 88 | 91 |
76 | 93 | 90 |
84 | 79 | 96 |
calculate the average math result of each group (A,B,C)
calculate the total average of all math results
calculate the sum of squares total (SST)
calculate the sum of squares group (SS between-groups, SS effect)
calculate the sum of squares error(SSE, SS within-groups)
calculate the MSG (mean of squares group) and MSE (mean of squares error) > in fact, we have the below formula
- ANOVA Table
- | Df | Sum sq | Mean sq | F value | Pr(>F) |
---|---|---|---|---|---|
Group | 2 | 663.3 | 331.7 | 10.4 | 0.004 |
Residuals | 27 | 860.6 | 31.9 | - | - |
as Pr < alpha then reject H0 => accept H1
multiple comparisons (post-hoc comparisons) As ANOVA only told us that μ1,μ2,μ3 are not equal but we don't know which one is not equal to which one. Thus, we need to do multiple comparisons
> According to the above formula, we need to do 3 times t-test separately. In order to keep the alpha equal to 0.05, we need to calculate the new alpha with Bonferroni correction: a* = a/times = a/(k*(k-1)/2) = a/3 = 0.0171
k*(k-1)/2 = 3
Individual t-Test for A and B, B and C, A and C
As the total variance of given two populations are equal then we should use the following formula:
H0: μ1 = μ2 H1: μ1 ≠ μ2 => p = 2 * 0.0096 = 0.0019 > a* = 0.017 => accept H0
do the same for the rest two pairs
One way ANOVA python implementation
Indeed, we could implement it by using python like the following:
1 | def anova_oneway(data: List[List[float]]): |
We could see that when we have only two groups of data, one way ANOVA and t-Test will give the same results. In conclusion, t-Test is a special case of one way ANOVA