Parametric statistical tests assume that a data sample was drawn from a specific population distribution. They often refer to statistical tests that assume the Gaussian distribution. Because it is so common for data to fit this distribution, parametric statistical methods are more commonly used. If their assumptions are met, they have greater power than the non-parametric tests. Otherwise, non-parametric tests should be used. Thus, parametric tests should only be used after carefully evaluating whether the assumptions of the test are sufficiently fulfilled.
Code source here: https://github.com/ymurong/stats_python_playground
This table gives an overview of the most popular parametric tests:
Test | Test for what? |
---|---|
One sample t-Test | mean of a given population |
Two sample t-Test | mean difference of two independent populations |
Paired t-Test | mean difference of two paired populations (not independent) |
Chi-squared test | variance of a given population |
F-Test | variance ratio of two independent population |
One sample t-Test
For a given sample that satisfies the Gaussian distribution, based on its sample variance, we could know that whether the mean of this population is statistically different from a known or hypothesized value by doing one sample t-Test.
According to the above formula, we could have the following python code to perform One sample t-Test.
1 | def t_test(data1: List[float], tail: TAIL = 'both', |
mean() and std() represents the mean and standard deviaition function
Two sample t-Test
For two independent samples that satisfy the Gaussian distribution, based on their sample variances, we could know that whether the mean difference of these two populations is statistically different from a known or hypothesized value by doing Two sample t-Test.
The assumption or null hypothesis of the test is that the means of two populations are equal. A rejection of this hypothesis indicates that there is sufficient evidence that the means of the populations are different, and in turn that the distributions are not equal.
If the total variance of given two populations are equal then we have:
otherwise, we have:
According to the above formula, we could have the following python code to perform One sample t-Test. Notice that here the equal parameter refers to whether the total variance of given two populations are equal or not, which will lead to different way of calculation.
1 | def t_test(data1: List[float], data2: List[float] = None, tail: TAIL = 'both', |
Paired t-Test
We may wish to compare the means between two data samples that are related in some way. For example, the data samples may represent two independent measures or evaluations of the same object. These data samples are repeated or dependent and are referred to as paired samples or repeated measures. Because the samples are not independent, we cannot use the Student’s t-test. Instead, we must use a modified version of the test that corrects for the fact that the data samples are dependent, called the paired Student’s t-test.
The test is simplified because it no longer assumes that there is variation between the observations, that observations were made in pairs, before and after a treatment on the same subject or subjects. The default assumption, or null hypothesis of the test, is that there is no difference in the means between the samples. The rejection of the null hypothesis indicates that there is enough evidence that the sample means are different.
According to the above formula, we could easily reuse the python implementation of the first one-sample t-test.
1 | def t_test_paired(data1: List[float], data2: List[float], tail: TAIL = 'both', mu: float = 0): |
Chi-squared test
Chi-squared test could be used to test the variance of a given sample.
According to the above formula, we could have the following python code to perform One sample Chi-squared test.
1 | def chi2_test(data: List[float], tail: TAIL = "both", sigma2: float = 1): |
F-Test
F-test could be used to test the variance ratio of two independent populations.
According to the above formula, we could have the following python code to perform Two sample Chi-squared test.
1 | def f_test(data1, data2, tail="both", ratio=1): |