title: "Lecture 16: Buffons Needle and Central Limit Theorem" author: "Junaid Hasan" date: "August 2, 2021"¶

Buffons Needle and Central Limit Theorem¶

  • An important theorem in probability is the Central Limit Theorem. It says (roughly) that if we sample many values from an unknown distribution and take their mean, and do this multiple times, the means will be approximately normally distributed.
  • A normal distribution (also known as Gaussian) has the shape of $y = e^{-x^2}$ with scaling.
  • Features: It is symmetric about its peak, it has exactly two inflection points, and is asymptotic to the x-axis.

contd..¶

  • Suppose we throw the needle in Buffon's needle N = 1000 times.
  • We can view this as sampling from $\{0,1 \}$ bernoulli trials with 1 meaning the needle crosses the line and 0 meaning it does not.
  • Thus throwing 1000 times and calculating empirical probability can be seen as averaging 1000 random samples from $\{0, 1\}$ with the probilities given.
  • If we do this many times, and make a histogram of the results we will notice a normal shape.
In [1]:
import math
import random
import statistics
import matplotlib.pyplot as plt
In [2]:
def buffon_needle(N = 10000):
    hits = 0
    currentestimate = 1
    listofestimates = []
    listofsteps = []
    for i in range(1, N+1):
    
        x1 = random.random()
        y1 = 0
    
        angle = random.random()*2*math.pi
    
        x2 = x1 + 1.0*math.cos(angle)
        y2 = y1 + 1.0*math.sin(angle)
    
        if x2 > 1.0 or x2 < 0 :
            hits = hits +1
            currentestimate = 1.0*hits/i
    
        if i% (N/100) == 0 :
            #print("Current estimate", currentestimate)
            listofestimates.append(currentestimate)
            listofsteps.append(i)
    mean  = listofestimates[-1]
    return mean

Let us run a buffon needle experiment once and see what we get.

In [3]:
buffon_needle(1000)
Out[3]:
0.6295180722891566

The mean I got is 0.6295

Now let us run multiple runs. Say I run each buffon needle experiment $M = 1000$ times.

In [4]:
N = 1000
M = 1000
listofmeans = []
for i in range(1, M+1):
    listofmeans.append(buffon_needle(N))

Question:¶

What is the distribution of the buffon needle throws?

Answer: Difficult.

Question:¶

What is the distribution of the means of the buffon needle throws?

Answer: Normal.

Lets see why.

In [5]:
plt.hist(listofmeans)
Out[5]:
(array([ 11.,  38.,  87., 184., 237., 229., 126.,  66.,  19.,   3.]),
 array([0.59159159, 0.60106971, 0.61054782, 0.62002594, 0.62950405,
        0.63898217, 0.64846028, 0.6579384 , 0.66741651, 0.67689463,
        0.68637275]),
 <BarContainer object of 10 artists>)
No description has been provided for this image
  • We can make the histogram look a little more granular by changing the number of bins
In [6]:
plt.hist(listofmeans, bins=50)
Out[6]:
(array([ 1.,  2.,  2.,  2.,  4.,  3.,  6.,  9.,  9., 11.,  8., 12., 19.,
        26., 22., 22., 37., 45., 34., 46., 39., 56., 56., 55., 31., 45.,
        54., 46., 40., 44., 35., 26., 34., 24.,  7., 24., 17., 14.,  5.,
         6.,  6.,  6.,  3.,  3.,  1.,  2.,  0.,  0.,  0.,  1.]),
 array([0.59159159, 0.59348721, 0.59538284, 0.59727846, 0.59917408,
        0.60106971, 0.60296533, 0.60486095, 0.60675658, 0.6086522 ,
        0.61054782, 0.61244345, 0.61433907, 0.61623469, 0.61813031,
        0.62002594, 0.62192156, 0.62381718, 0.62571281, 0.62760843,
        0.62950405, 0.63139968, 0.6332953 , 0.63519092, 0.63708655,
        0.63898217, 0.64087779, 0.64277341, 0.64466904, 0.64656466,
        0.64846028, 0.65035591, 0.65225153, 0.65414715, 0.65604278,
        0.6579384 , 0.65983402, 0.66172965, 0.66362527, 0.66552089,
        0.66741651, 0.66931214, 0.67120776, 0.67310338, 0.67499901,
        0.67689463, 0.67879025, 0.68068588, 0.6825815 , 0.68447712,
        0.68637275]),
 <BarContainer object of 50 artists>)
No description has been provided for this image

# CLT lessons¶

  • The CLT says what these histograms suggest that the distribution of these values is very nearly normal.
  • Furthermore we can use statistics now to get confidence intervals.

CLT Application¶

  • Let us say $M = 1000$ and the values are $x_1, \ldots, x_{1000}$.
  • When I run, I get the mean is $$\bar{x} = \frac{1}{n} \sum_{i = 1}^{n} x_i = 0.6365$$
  • The standard deviation was $$s = 0.0158.$$

The 68-95-99.7 rule¶

  • https://en.wikipedia.org/wiki/68–95–99.7_rule

Confidence Intervals¶

  • For a true normal 99.73% of the values are within 3 standard deviations of the mean.
  • Hence we can be very confident that the true probability $p$ lies between $\bar{x} -3s$ and $\bar{x} + 3s$. In other words we can be 99.73% confident that $$0.5912 < p < 0.6819.$$
  • Let us see what happened in our case
In [7]:
m = statistics.mean(listofmeans)
s = statistics.stdev(listofmeans)
[m - 3*s, m+3*s]
Out[7]:
[0.5911825659564698, 0.6819173969569878]

Contd..¶

  • We saw that with 99.73% confidence we can claim that $$0.5912 < p < 0.6819.$$
  • What if we want sharper bound?
  • A simple way is to reduce the confidence say 95%. Then the true probability lies in $\bar{x} - 2s$ and $\bar{x} +2s$.
In [8]:
[m - 2*s, m+2*s]
Out[8]:
[0.6063050377898894, 0.6667949251235682]

A better way for sharper estimates¶

  • Either we must reduce the confidence (not desirable)
  • Or we must increase $M$ or $N$.
  • Which one should we increase?
  • Let us increase M from 1000 to 10000
In [9]:
N = 1000
M = 10000
listofmeans = []
for i in range(1, M+1):
    listofmeans.append(buffon_needle(N))
In [10]:
m = statistics.mean(listofmeans)
s = statistics.stdev(listofmeans)
[m - 3*s, m+3*s]
Out[10]:
[0.5918533511032384, 0.6828996926380229]
  • Note that the interval did not change much.

  • Now let us increase $N$ from 1000 to 10000 and keep $M$ at 1000

In [11]:
N = 10000
M = 1000
listofmeans = []
for i in range(1, M+1):
    listofmeans.append(buffon_needle(N))

m = statistics.mean(listofmeans)
s = statistics.stdev(listofmeans)
[m - 3*s, m+3*s]
Out[11]:
[0.6219901358753107, 0.6516960943534215]
  • Note that now the interval is much smaller

Answer¶

  • We must make each run with more steps, increase $N$.
  • Notice this is different from the number of runs M.
  • Number of runs M does not affect standard deviation significiantly, since each sample mean still is obtained the same steps.

Getting confidence interval without standard deviation or mean¶

  • Say we run our simulation 10 times and get 10 estimates.
  • Say on each run we throw 1 million needles.
  • If each estimate is a mean of the same distribution, then by CLT we know that these 10 estimates are normally distributed.
  • Normal means that the data must be symmetric about mean, therefore the probability that the estimates are on one side of the true probability is $$\frac{2}{2^{10}} \approx 0.00195 \approx 0.2 \text{ percent}$$
In [12]:
N = 1000000
M = 10
listofmeans = []
for i in range(1, M+1):
    listofmeans.append(buffon_needle(N))
listofmeans
Out[12]:
[0.637036,
 0.637111,
 0.6365326365326366,
 0.636739,
 0.636612,
 0.636875,
 0.63702,
 0.6371949115847347,
 0.637007,
 0.6370776370776371]
  • Therefore with 99.8% chance the mean is between minimum and maximum [0.6365, 0.6372]
In [13]:
m = statistics.mean(listofmeans)
s = statistics.stdev(listofmeans)
[m - 3*s, m+3*s]
Out[13]:
[0.6362514672547308, 0.6375895697842708]
  • Using standard deviation and mean we know that with tells us that with 99.73% confidence the mean is in [0.6363, 0.6376].

Conclusion¶

  • Therefore we can use the mean+standard deviation method or the symmetry method to get estimates about the true mean.