What is it?
In Statistics, one-way ANOVA is a statistical model which tests if different samples are drawn from the same population, taking into account only one variable. Itβs a subset of ANOVA, applying it to a single variable.
The main principle behind one-way ANOVA is to compare different groups and samples, making a Hypothesis Testing if the means of different groups are equal or significantly different.
Calculating ANOVA
Using Python
One could easily calculate the ANOVA table from multiples samples with a shared uni-variate variable using Python. The statsmodels library uses SciPy to easily return an ANOVA table.
from statsmodels.formula.api import ols
import statsmodels.api as sm
data = pd.DataFrame({
"Group 1": [85, 86, 88, 75, 78, 94, 98, 79, 71, 80],
"Group 2": [91, 92, 93, 85, 87, 84, 82, 88, 95, 96],
"Group 3": [79, 78, 88, 94, 92, 85, 83, 85, 82, 81]
})
# Transform to Group Identifier / Value format
data = data.melt(var_name="group")
lm = ols("value ~ group", data=data).fit()
table = sm.stats.anova_lm(lm)
table
statsmodels summary
Another summary containing the fit metrics can be obtained by calling
ols.summary()
after fitting. This would return the F-statistic, R squared and other metrics.
Calculating by hand
Normally, ANOVA itβs not done by hand. Itβs too calculation-intensive. However, because one-way ANOVA uses only a single variable, one could follow the step-by-step to understand the underlying algorithm:
-
Calculate each sample mean and the overall mean.
For each sample/ group present in the experiment, calculate its mean and then the overall mean, with all samples.
-
Calculate the sum of squares, SS.
Given the number of samples , the sample mean , and the overall mean as , calculate SS, sum of squares, of each sample and then sum them all.
-
Calculate the squared ___error, SSE.
Similarly to Mean Squared Error, calculate the sum of squared errors of each observation. Given as the observation in group , and as the mean of the group , and as the number of observations of the sample:
-
Calculate the total sum of squares, SST.
The total sum of squares, SST, is the sum of SSE and SS.
-
Fill in the ANOVA table.
Now, just fill the ANOVA table with the metrics you just calculated.
Source | Sum of Squares(SS) | Degrees of Freedom(DF) | Mean Squares (MS) | F-statistic |
---|---|---|---|---|
Treatment | SS | |||
Residual | SSE | |||
Total | SST |