Large numbers in statistics.
The law of large numbers has a very central role in probability and statistics. LLN or the law of large numbers is a theorem that depicts the result of performing the same experiment a large number of times. The average of the results obtained from a large number of trials should be close to the expected value and will tend to become closer to the expected value as more attempts are performed.
The law of large numbers guarantees stable long-term results for the averages of some random events.
The Law of Large Numbers is not to be mistaken with the Law of Averages, which states that the distribution of outcomes in a sample (large or small) reflects the distribution of outcomes of the population.
Example: rolling a die:
Expected value (also known as EV or expectation) is a long-run average value of random variables. It also indicates the probability-weighted average of all possible values.
In following examle of rolling a die we visualise sample average vs expected value dependency. IMPORTANT: this step does not show LLN yet !!!
import matplotlib.pyplot as plt import numpy as np # die probabilities (weighted) f1 = 2/8 f2 = 2/8 f3 = 1/8 f4 = 1/8 f5 = 1/8 f6 = 1/8 # confirm sum to 1 print(f1+f2+f3+f4+f5+f6) # expected value expval = 1*f1 + 2*f2 + 3*f3 + 4*f4 + 5*f5 + 6*f6 # generate "population" population = [ 1, 1, 2, 2, 3, 4, 5, 6 ] for i in range(20): population = np.hstack((population,population)) nPop = len(population) # draw sample of 8 rolls sample = np.random.choice(population,8) ## experiment: draw larger and larger samples k = 5000 # maximum number of samples sampleAve = np.zeros(k) for i in range(k): idx = np.floor(np.random.rand(i+1)*nPop) sampleAve[i] = np.mean( population[idx.astype(int)] ) plt.plot(sampleAve,'k') plt.plot([1,k],[expval,expval],'r',linewidth=4) plt.xlabel('Number of samples') plt.ylabel('Value') plt.ylim([expval-1, expval+1]) plt.legend(('Sample average','expected value')) # mean of samples converges to population estimate quickly: print( np.mean(sampleAve) ) print( np.mean(sampleAve[:9]) )
LLN - computing sample means vs expected value:
IMPORTANT: this is LLN demonstration !!!
# generate population data with known mean populationN = 1000000 population = np.random.randn(populationN) population = population - np.mean(population) # demean # get means of samples samplesize = 30 numberOfExps = 500 samplemeans = np.zeros(numberOfExps) for expi in range(numberOfExps): # get a sample and compute its mean sampleidx = np.random.randint(0,populationN,samplesize) samplemeans[expi] = np.mean(population[ sampleidx ]) # show the results! fig,ax = plt.subplots(2,1,figsize=(4,6)) ax.plot(samplemeans,'s-') ax.plot([0,numberOfExps],[np.mean(population),np.mean(population)],'r',linewidth=3) ax.set_xlabel('Experiment number') ax.set_ylabel('mean value') ax.legend(('Sample means','Population mean')) ax.plot(np.cumsum(samplemeans) / np.arange(1,numberOfExps+1),'s-') ax.plot([0,numberOfExps],[np.mean(population),np.mean(population)],'r',linewidth=3) ax.set_xlabel('Experiment number') ax.set_ylabel('mean value') ax.legend(('Sample means','Population mean')) plt.show()