## ANOVA in statistics.

Analysis of variance (ANOVA) is a statistical method that is usually used to check if the means of two or more groups are significantly different from each other. ANOVA checks the impact of one or more factors by comparing the means of different samples.

TSS = SSE + RSS = Within_group_variability + Between_group_variability

Total Variation → TSS = Total Sum of Squares

Unexplained Variation → SSE = Sum of Squared Errors

Explained Variation → RSS = Regression Sum of Squares

The statistic which measures if the means of different samples are really (significantly) different or not is called the F-Ratio. Lower the F-Ratio, more similar are the sample means. In that case, we cannot reject the null hypothesis.If no true variance exists between the groups, the ANOVA's F-ratio should equal close to 1.

F = Between_group_variability / Within_group_variability

A one-way ANOVA lets us know that at least two groups are different from each other. But it won’t tell us which groups are different. If our test returns a significant f-statistic, we may need to run a post-hoc test to tell us exactly which groups have a difference in means.

## One-way ANOVA - simulating data:

```
import numpy as np
import matplotlib.pyplot as plt
import pingouin as pg
import pandas as pd
## data parameters
# group means
mean1 = 4
mean2 = 3.8
mean3 = 7
# samples per group
N1 = 30
N2 = 35
N3 = 29
# standard deviation (assume common across groups)
stdev = 2
## now to simulate the data
data1 = mean1 + np.random.randn(N1)*stdev
data2 = mean2 + np.random.randn(N2)*stdev
data3 = mean3 + np.random.randn(N3)*stdev
datacolumn = np.hstack((data1,data2,data3))
# group labels
groups = ['1']*N1 + ['2']*N2 + ['3']*N3
# convert to a pandas dataframe
df = pd.DataFrame({'TheData':datacolumn,'Group':groups})
df
```

## Generating the ANOVA table:

```
pg.anova(data=df,dv='TheData',between='Group')
```

## Performing the Tukey test:

```
pg.pairwise_tukey(data=df,dv='TheData',between='Group')
```

## One-way ANOVA - visualizing the means differences:

```
df.boxplot('TheData',by='Group');
```