...

Saturday, June 15, 2013

Computer Science 12

How can we know when it's safe to assume that the average results of a finite number of trials is representative of what we would get if we did the test many more times.

The question we want to ask is: How many samples do we need, to finally believe the answer? How many trials do we have to conduct to have confidence in the result?

At the root of all of it, is the notion of Variance. Variance is a measure of how much spread is there in the possible outcomes, or rather, how much do the outcomes vary?

It is more desirable to run multiple trials compared to a single, large trial. It is because running multiple trials allow us to have a measure of Variance. It is through this measure that we can have a sense of how trustworthy is the result.

The Variance is formalized through the concept of Standard Deviation. Informally, the Standard Deviation measures the fraction of values which are close to the Mean. If many values are close to the Mean, the SD is small. If many values are far from the Mean, the SD is large. If all values are the same, then SD is 0.

Here is Standard Deviation expressed in computational form:

def standardDeviation(X):
    mean=sum(X)/float(len(X))
    total=0.0
    for x in X:
        total+=(x-mean)**2
    return (total/len(X))**0.5


Now, we can use the SD to look at the relationship between the number of samples and the amount of confidence we have in the results. We can plot the Standard Deviations of our trials to demonstrate / prove that the accuracy of the result is increasing with the increase of the number of trials.

Let us look back again at the previous coin flipping example. We'll make use of the following two functions:

def flipCoin():
    return random.randint(0,1)

def flipCoinList(n):
    x=[]
    for i in range(n):
        x.append(flipCoin())
    return x

def mean(X):
    return sum(X)/float(len(X))


flipCoin() returns 0 or 1, and flipCoinAverage(n) returns the list of n flips, and mean() tells us the mean of a list.

We can observe Variance using this example:

def flipCoinTrial(flipsExp,trials):
    flips=[]
    means=[]
    stdDev=[]
    for i in range(flipsExp):
        currentFlips=2**i
        currentMeans=[]
        for j in range(trials):
            currentList=flipCoinList(currentFlips)
            currentMeans.append(mean(currentList))
        flips.append(currentFlips)
        means.append(currentMeans)
        stdDev.append(standardDeviation(currentMeans))
    pylab.plot(flips,means)
    pylab.xlabel("Number of Flips")
    pylab.ylabel("Obtained Mean")
    pylab.title("Mean of Flips")
    pylab.semilogx()
    pylab.ylim(0,1)
    pylab.figure()
    pylab.plot(flips,stdDev)
    pylab.xlabel(str(trials)+" Trials of Flips")
    pylab.ylabel("Standard Deviation")
    pylab.title("Standard Deviation of Trials")
    pylab.semilogx()
    pylab.ylim(0,1)
    pylab.show()


This flips coins a number of times, and performs it in sets of trials. The SD is then measured for each set, before a new set of trials begin with more flips of the coins. Here's the result of 20 to 20 flips, each done 10 times:




Beautiful, isn't it? As you can see, as the number of flips increases, the Variance in the sets of output decreases.

The size of the Standard Deviation compared to the Mean shows us how reliable our results are. The smaller the Standard Deviation, the better. The Coefficient of Variation allows us to represent this ratio. It is simply:
σ(x) / (1/n . ∑(i=1,n)xi )

Where n is the length of the set.

Or more simply:
Standard Deviation / Mean

(Doesn't Mathematical notations just freak you out sometimes?)

The smaller the Coefficient of Variation, the more accurate it is. Generally, CoV  below 1 is considered Low Variance. However, the CoV cannot be used for Confidence Intervals.

Let's look at some of the plotting functions available in pylab. One way to observe distribution is through the use of a histogram. The following function allows plotting of histograms:

def flipCoinHistogramTrial(flips,trials):
    trialsList=[]
    for i in range(trials):
        trialsList.append(mean(flipCoinList(flips)))
    pylab.hist(trialsList,bins=25)
    pylab.xlabel("Mean of Flips")
    pylab.ylabel("Occurrence")
    pylab.title("Histogram of "+str(trials)+" Trials of "+str(flips)+" flips")
    pylab.show()


Basically the following code would flip coins "flips" number of times for "trials" times. It then plots the results on a histogram. As you can see, the diagram depicts a Normal Distribution.



As we talk about the Standard Deviation and Coefficient of Variation, we are not only interested about the mean but also about the distribution of the results. A perfect Normal Distribution peaks at the mean and falls off symmetrically on both side. The Normal Distribution is what we always call the Bell Curve.

The Normal Distribution is frequently used to construct probabilistic models for two reasons:
1) It has good mathematical properties and it is easy to reason about
2) It has many naturally occurring instances in the world so most things map on it pretty well

Mathematically, the Normal Distribution can be characterized by two parameters:
1) Mean
2) Standard Deviation

If we have a Normal Distribution, the Mean and Standard Deviation can be used to compute Confidence Intervals.

A Confidence Interval allows us to estimate an unknown parameter by providing a range that is likely to contain the unknown value and a Confidence Level  that the unknown value lies within that range.

An example of a Confidence Interval is:
1 ± 0.05

If the Confidence Level is not stated, it is assumed to be 95%. This means that 95% of the time, the results will lie between 0.95 to 1.05.

An Empirical Rule gives us a handy way to estimate the Confidence Interval given the Mean and the Standard Deviation. It is as such for a true Normal Distribution:
68% of the data will be within 1 SD of the Mean
95% of the data will be within 2 SD of the Mean
99.7% of the data will be within 3SD of the Mean

A trick used to estimate the Standard Deviation is the Standard Error. The Standard Error is an estimate of the SD with the assumption that the errors are normally distributed and that the Sampled Population is small relative to the Actual Population. This is usable because the Normal Distribution is often an accurate model of reality.

If p = % of the sample that met a certain condition, and n = sample size, we can say:
Standard Error = (p*(100-p)/n)1/2

Say for example, out of 1000 voters, 60% of the voters voted for Kelvin. We can say that the Standard Error is:
(60*(100-60)/1000)1/2
=(60*40/1000)1/2
=2.41/2
=1.549193338%

This means that with 95% confidence level, the true percentage of votes Kelvin would get is within 2 SE of 60%. In order words, the Confidence Interval would be:
60% ± 3.10%

We can test this result by creating a poll ourselves:

def poll(voters,percentage):
    votes=0.0
    for i in range(voters):
        if random.random()            votes+=1.0
    return votes

def testStandardError(n,p,trials):
    results=[]
    for i in range(trials):
        results.append(poll(n,p))
    pylab.hist(results,bins=100)
    pylab.xlabel("Votes Gathered")
    pylab.ylabel("Matching Trials")
    pylab.title("Standard Deviation is "+str(standardDeviation(results)/n*100))
    pylab.show()

testStandardError(1000,60.0,1000)


As you can see, the actual Standard Deviation gathered from a real test is actually quite close to what we might have gotten from the SE formula.



This shows how good the Standard Error is as an estimation to the actual Standard Deviation.

No comments :

Post a Comment

<