...

Sunday, June 16, 2013

Computer Science 13

How do we construct computational models that will help us understand the real world?

A Gaussian Distribution, or commonly known as Normal Distribution or the Bell Curve, can be described with just its Mean and Standard Deviation.

Whenever possible, we would model distributions as Gaussian because they are extremely easy to deal with. We have things like the Empirical Rule to help us draw conclusions. However, if a certain distribution is not Gaussian and we treat it that way, the results could be extremely misleading.

Take for example the results of rolling a die. The outcome of die-rolling is not normally distributed. In the case of a fair-die, each face is equally likely to appear. Plotting the results on a histogram would not yield a Bell Curve.

In a fair lottery system, the probability of anything coming up is the same. Most people would study past results in order to predict what comes up next. However, illustrating the Gambler's Fallacy again, lottery has no memory and is thus not based on past results.

In case of the Lottery and Die Rolling, we have a Uniform Distribution. Uniform Distributions describe distributions where each result is equally plausible. The only thing required to describe a Uniform Distribution is the Range of the results.

The Uniform Distribution does not occur much in nature. It is typically observed in fair chance games deviced by humans. Uniform Distribution is not as useful in modelling complex systems as the Gaussian Distribution.

Another type of distribution that occurs quite frequently is the Exponential Distribution. Exponential Distribution is the only Continuous Distribution that is memory-less. The concentration of drug in the bloodstream can be modelled with Exponential Distribution.

Assume that each time-step, each molecule has a probability p of being cleared by the body. The system is memory-less, in the sense that the probability of each molecule being cleared is not dependent on what happened to the previous molecules in the previous steps.

If the probability of being cleared at each step is p, then at time T=1, the probability of the molecule still being in the body is 1-p. Since it is memory-less, the probability of it being cleared is still 1-p. Therefore, the probability of it still being in the body is (1-p)2.

We can see that the probability of the molecule still being in the body at the particular time is (1-p)t

Suppose there are 1000 molecules, and at each time-step, each molecule has a chance of 0.01 to be cleared by the body. We would end up with the following graphs:



Of course, what we are seeing is the mathematical graphing of what we would expect. This graph is achieved through the following code:

import pylab

def mathematicalMolecules(n, clearProbability, steps):
    moleculesRemaining=[n]
    for i in range(steps):
        moleculesRemaining.append(n*((1-clearProbability)**i))
    return moleculesRemaining

moleculeList = mathematicalMolecules(1000,0.01,1000)
pylab.plot(moleculeList)
pylab.xlabel("Time (T)")
pylab.ylabel("Molecules")
pylab.title("Exponential Decay of Molecules")
pylab.show()


Now, notice that the rate of clearing of molecules decrease in a logarithmic way. The less molecules there are, the less molecules are going to be lost. If we plot the graph with semilogy, we can see that it is indeed logarithmic:



We can actually create a Monty Carlo simulation to mimick the process described, instead of simply using mathematical formulas to do it. We make use of the following code to implement this:

import random

def simulationMolecules(n, clearProbability, steps):
    moleculesRemaining=[n]
    for i in range(steps):
        for j in range(n):
            if random.random()<clearProbability:
                n-=1
        moleculesRemaining.append(n)
    return moleculesRemaining

moleculeList = simulationMolecules(1000,0.01,1000)
pylab.plot(moleculeList)
pylab.xlabel("Time (T)")
pylab.ylabel("Molecules")
pylab.title("Exponential Decay of Molecules (Simulation)")
pylab.show()


Notice that the output is not as smooth as we want it to be, but it matches the mathematical diagram almost exactly:



If we look at this graph in a logarithmic way, this is what we'll get:



The biggest difference is that our simulation did not allow for fractions of a molecule, which is more true to what we are looking for.

Here's a direct mapping of the two graphs:





The Mathematical Model is also known as the Analytic Model. Here, we've illustrated the difference between Analytical and Simulation for Exponential Decay. There is no right or wrong answer over here. When looking at the models, we look at its Fidelity and Credibility. Fidelity refers to whether the outcome is accurate. Credibility is whether the outcome can be believed.

Then there is Utility. Which model can be applied to what kind of questions? There is an additional Utility offered by the Simulation Model, because we can easily change the model to be slightly different in ways that are usually harder for an Analytic Model.

For example, if we wish to calculate the rate of which the body clears bacteria, but also factor in things like the rate at which the bacteria can regenerate themselves, it is easier to model with a Simulation rather than an Analytic Model.

Let us look at the Monty Hall problem. In the Monty Hall problem, we start with 3 doors. Behind one of the doors, there is a great prize. Behind each of the other doors, there is a booby prize. A person is asked to choose one of the doors. The host then reveals one of the other two doors with the booby prize. There is then a question posed to the person: Do you want to switch?

The question is: Which is better? Does it matter whether we switch?

Let us look at this first in an Analytical perspective:
The chance that the prize lies in your door is 1/3
The chance that the prize lies in one of the other 2 doors is 2/3

After choosing the first door, one of the two other doors is eliminated. Now, what you have is:
The chance that the prize lies in your door is 1/3
The chance that the prize lies in the other (2 - 1) door is 2/3

Therefore, making the switch doubles your chance from 1/3 to 2/3.

The following maps the problem out in a computational way:

import random
import pylab

def montyHall(n):
    won=0
    for i in range(n):
        correctDoor = random.randint(1,3)
        chosenDoor = random.randint(1,3)
        if correctDoor==chosenDoor:
            won+=1
    return won

def montyHallSwitch(n):
    won=0
    for i in range(n):
        correctDoor = random.randint(1,3)
        chosenDoor = random.randint(1,3)
        if correctDoor!=chosenDoor:
            won+=1
    return won

def plotMontyHall(trialsExp):
    participants=[]
    montyHallList=[]
    for i in range(trialsExp):
        montyHallList.append(montyHall(2**i))
        participants.append(2**i)
    montyHallSwitchList=[]
    for i in range(trialsExp):
        montyHallSwitchList.append(montyHallSwitch(2**i))
    pylab.plot(participants,montyHallList)
    pylab.plot(participants,montyHallSwitchList)
    pylab.xlabel("Participants")
    pylab.ylabel("Winners")
    pylab.title("Monty Hall: Stand vs Switch")
    pylab.legend(["Stand","Switch"])
    pylab.show()

plotMontyHall(15)


This is based upon the following:
1) If guessed door is prize door, stand will give a win
2) If guessed door is NOT prize door, switch will give a win

This is the resulting graph. As we can see, the winning fraction is exactly twice for Switch. Also, 1/3 of Stand participants and 2/3 of Switch participants are winners.:



We can use randomized algorithms to solve problems in which randomness plays no roles. This is surprising, but it is an extremely important tool.

Let us consider Pi (π). Pi is a constant that people have known since the 18th century. The earliest estimate of Pi was 4*(8/9)2=3.16 by the Egyptians in 1650 B.C. In 650 B.C., when Solomon was creating his Basin, he made use of Pi, which was roughly 3. The best estimate of Pi in ancient time was by Achimedes. He didn't give the value of Pi. Instead, he built a polygon of 96 sides, in which he concluded Pi to be between 223/71 to 22/7.

Later on, Buffon and Laplace came up with a Stochastic Simulation to solve for Pi. This is the Buffon and Laplace solution to finding Pi:



Suppose that we have a circle of radius 1 inside a square of sides 2.

We know for sure that the area of the circle is πr2, and the area of the square is r2

Buffon and Laplace proposed to randomly drop needles into such a set-up on the ground, then use the number of needles in the circle to estimate the area of the circle and the total number of needles to estimate the area of the square.

We can then use the following to obtain π:
Area of Circle / Area of Square
= πr2/(2r)2
= πr2/4r2
= π/4

π = 4 (Area of Circle / Area of Square)

Therefore:

π = 4 (Needles in Circle / Needles in Square)

Because Buffon and Laplace didn't have the proper physical setup to conduct such a test, we can help him through the use of a simulation:

import math
import random

def estimatePi(n):
    circle=0
    for i in range(n):
        x = random.random()
        y = random.random()
        if math.hypot(x-0.5,y-0.5)<0.5:
            circle+=1
    return 4*circle/float(n)

print (estimatePi(50000000))


In my example, I used a circle of radius 0.5 instead. The formula is the same. We drop 50 million pins into the setup, counted the pins in the circle, and used the formula to give us the answer. The pins were just a means of estimating area.





As you can see, as the amount of pins used increased, the results were closer to Pi and its variation decreased. Here is the code that generated these graphs:

import pylab

def standardDeviation(X):
    mean=sum(X)/float(len(X))
    total=0.0
    for x in X:
        total+=(x-mean)**2
    return (total/len(X))**0.5

def mean(X):
    return sum(X)/float(len(X))

def plotEstimatePi(pinsExp, trials):
    pis=[]
    pins=[]
    stdDev=[]
    for i in range(pinsExp):
        pins.append(trials*2**i)
        currentPis=[]
        for j in range(trials):
            currentPis.append(estimatePi(2**i))
        pis.append(mean(currentPis))
        stdDev.append(standardDeviation(currentPis))
    pylab.plot(pins,pis,"o")
    pylab.xlabel("Pins Used")
    pylab.ylabel("Estimated Pi")
    pylab.title("Pi Estimation, Final Value = "+str(pis[len(pis)-1]))
    pylab.semilogx()
    pylab.figure()
    pylab.plot(pins,stdDev)
    pylab.xlabel("Pins Used")
    pylab.ylabel("Standard Deviation")
    pylab.title("Std Dev, Final Value = "+str(stdDev[len(stdDev)-1]))
    pylab.semilogx()
    pylab.show()
  
plotEstimatePi(15,100)


Note that we can actually come up with more accurate representations of Pi through other means, like binary search. However, this example illustrates the usefulness of randomness in solving non-random problems. This method is extremely useful when solving for integration, because we know that the integration of a function is the area under the graph.

3 comments :

  1. I am genuinely thankful to the holder of this web site who has shared this wonderful paragraph at
    at this time.

    Also visit my webpage rutherford

    ReplyDelete
  2. Do you have any video of that? I'd like to find out some additional information.

    Feel free to surf to my web page; healthy juicing recipes

    ReplyDelete
  3. Hey would you mind letting me know which webhost you're utilizing? I've loaded your blog in
    3 different web browsers and I must say this blog loads a lot faster then most.

    Can you suggest a good internet hosting provider at a fair price?

    Many thanks, I appreciate it!

    Feel free to surf to my site ... pikesville

    ReplyDelete

<