Want to share your content on python-bloggers? click here.
Bessel’s correction is the use of
Recall that bias is defined as:
where
To show that
and as a result
Rearranging the familiar expression for variance yields
and similarly,
Therefore
Thus,
and we conclude that
and since
Demonstration
An important property of an unbiased estimator of a population parameter is that if the sample statistic is evaluated for every possible sample and the average computed, the mean over all samples will exactly equal the population parameter. For a given population with mean
We now attempt to verify this property on the following dataset:
The Python itertools module exposes a collection of efficient iterators that stream values on-demand based on various starting and/or stopping conditions. For example, the permutations
implementation takes as arguments an iterable and the length of the permutation r
. It returns all r
-length permutations of elements from the iterable (itertools also exposes a combinations
function that does the same for all r-length combinations). The product
function generates the cartesian product of the specified iterables, and takes an optional repeat
argument. From the documentation:
To compute the product of an iterable with itself, specify the number of repetitions with the optional repeat keyword argument. For example,
product(A, repeat=4)
means the same as product(A, A, A, A).
product
is used to compute the average sample variance for all 2, 3 and 4-element permutations from
We now compute the average of the sample variance for all
""" Demonstrating that the sample variance is an unbiased estimator of the population variance. Generate all possible 2, 3, 4 and 5-element permutations from [7, 9, 10, 12, 15], and determine the sample variance of each sample. The average of the sample variances will exactly equate to the population variance if the sample variance is an unbiased estimator of the population variance. """ import itertools import numpy as np v = [7, 9, 10, 12, 15] # Verify that the average of the sample variance # for all 2-element samples equates to 7.44. s2 = list(itertools.product(v, repeat=2)) result2 = np.mean([np.var(ii, ddof=1) for ii in s2]) # Verify that the average of the sample variance # for all 3-element samples equates to 7.44. s3 = list(itertools.product(v, repeat=3)) result3 = np.mean([np.var(ii, ddof=1) for ii in s3]) # Verify that the average of the sample variance # for all 4-element samples equates to 7.44. s4 = list(itertools.product(v, repeat=4)) result4 = np.mean([np.var(ii, ddof=1) for ii in s4]) # Verify that the average of the sample variance # for all 5-element samples equates to 7.44. s5 = list(itertools.product(v, repeat=5)) result5 = np.mean([np.var(ii, ddof=1) for ii in s5]) print(f"result2: {result2}") print(f"result3: {result3}") print(f"result4: {result4}") print(f"result5: {result5}")
result2: 7.44 result3: 7.4399999999999995 result4: 7.44 result5: 7.44
Since the sample variance is an unbiased estimator of the population variance, these results should come as no surprise, but it is an interesting demonstration nonetheless.
Want to share your content on python-bloggers? click here.