...

Tuesday, June 18, 2013

Computer Science 14

Before believing the result of any simulation, we have to have confidence that our conceptual model is correct, and that we have correctly implemented the model.

Before, we looked at Statistical Tests. These tests give us Statistical conclusions which tell us things about the simulation, about how it converges to a certain result. However, we still have to perform Sanity Checks.

One of the ways we can do this is to test the results against reality. For example, in the previous article, the value of Pi obtained from the simulation can be tested with the formula to know if we got the right answer or not.

Let us look at an interplay between Physical Reality, Theoretical Models and Computational Models. The Physical Reality is based on the Physical Situation. An example of a Physical Situation can be the price of Big Mac across countries (which we know as the Big Mac Index), or a stock market, or viruses in a body. We use a Theoretical Model to have some deeper insights into it, and when it gets too complicated or is unable to show us everything, we use a Computational Model.

At times when we conduct an actual experiment, we may not get the results that match exactly with the theory. To work with experimental errors, we have to assume that there is some sort of random perturbation applied to the results. Gaussian's work suggests that we can model experimental errors as Normally Distributed.

The Hooke's Law, F=kx, was described by Robert Hooke in 1678. In plain English, the Hooke's Law states that the Force (F) stored in a spring is linearly related to the Distance (x) the spring has either been compressed or stretched multiplied by the Stiffness (k) of the spring. A stiff spring has a large k value. This law holds for a wide variety of materials and systems. However, of course, it does not hold for a force that causes the spring to exceed its elastic limit.

We can find the k value of a spring by suspending the spring and hanging known masses on it. Measuring the distance for each mass. Suppose that a 1K
Using the rules:
F = kx
F = ma
ma = kx
k = ma/x

We derive:
k
= 1Kg * 9.81m/s2 / 0.2m
= 49.05N/m

We can do a sanity check to make sure this is correct (using 0.4m as the distance, we should evaluate to 2Kg):
49.05 = m * 9.81m/s2  / 0.4m
49.05 / (9.81m/s2/0.4) = 2Kg

What we are assuming here is that 49.05N/m is the correct k value with just one experiment. However, remember that experimental errors are normally distributed. If we actually put a 2Kg weight there, we may or may not get a stretching of exactly 0.4m. People would typically repeat the same experiment many times with different weights in order to get enough data to compute an accurate value of k.

Suppose that we have the following data recorded by a fellow scientist:

0.0 0.0
0.05 0.00968
0.1 0.02087
0.15 0.02871
0.2 0.03849
0.25 0.05371
0.3 0.05917
0.35 0.07066
0.4 0.08041
0.45 0.08863
0.5 0.10219
0.55 0.10251
0.6 0.09903
0.65 0.10078
0.7 0.10321
0.75 0.09605
0.8 0.09730
0.85 0.09986
0.9 0.09876
0.95 0.09825
1.0 0.09986


The data comes in a file called spring_data.txt describing weights (in kg) and stretch (in m). In order to manipulate this, we can either manually copy the data into our program, or import and parse it from within Python. In this example, we are going to import and parse the data:

The following code imports the data, and then returns two lists:

def importData(fileName):
    masses=[]
    distances=[]
    file=open(fileName,'r')
    for line in file:
        m,d=line.split()
        masses.append(float(m))
        distances.append(float(d))
    file.close()
    return masses,distances

masses,distances=importData("spring_data.txt")
print(masses)
print(distances)


Now that we have the two lists, we can then plot the data using:

import pylab

pylab.plot(masses,distances,"o")
pylab.xlabel("Mass (kg)")
pylab.ylabel("Distance (m)")
pylab.title("Spring Experiment")
pylab.show()


As you can see, the distances are somewhat linear at first (with some level of normally distributed error), then it evens off at the end:



In Numpy, there is a type called the "array". This array has nothing to do with the array we have in Java. The array here can be considered as more of a matrix. Changing data to an array allows:
1) Mathematical operations to be performed on it (i.e. array*3 will multiply every element by 3)
2) Cross-products between two arrays (array1*array2)

Lets look at some application of arrays:

import pylab

forces = pylab.array(masses)*9.81

pylab.plot(forces,distances,"o")
pylab.xlabel("Force (N)")
pylab.ylabel("Distance (m)")
pylab.title("Spring Experiment")
pylab.show()


In this example, the masses list is changed into an array and then multiplied by 9.81. What we get is a list of Force (N):



From here, we can calculate k. But before that,we must know whether our data is sensible. In our Theoretical Model, the data should fall on a line F=kx. If we can draw a good line here, we can find k very easily.

If we only have two points, we can easily join them to create a line. If we have a bunch of points scattered around, we must find the line that is the closest to all the points. This is known as Fitting. In order to find the line, we need to have a measure to how good is the fit. We need an Objective Function to compare fits. The standard Objective Function is the Least Squares Fit.

Suppose we have two lists, a list of predictions according to the line, and a list of observations. The Least Squares Fit is described as below:

def leastSquaresFit(observed,predicted):
    sum=0.0
    for i in range(len(observed)):
        sum+=(predicted[i]-observed[i])**2
    return sum


For comparing two Line of Best Fits (or Polynomial of Best Fits), the lower the number, the better.

A good model is created when, for each independent variable Mass (kg), we can predict a good dependent variable Displacement (m).

For creation of the Line of Best fit of any graph, we use Linear Regression. Pylab does Linear Regression using the function polyfit(observedX,observedY,degreeOfPolynomial). Recall that a line is y=mx+c. Polyfit would return the values of m and c for 1 degree fitting:

import pylab

forcesArray = pylab.array(masses)*9.81
distancesArray = pylab.array(distances)

m,c = pylab.polyfit(forcesArray,distancesArray,1)
distancesFitArray = m*forcesArray+c

pylab.plot(forcesArray,distancesArray,"o")
pylab.plot(forcesArray,distancesFitArray)
pylab.xlabel("Force (N)")
pylab.ylabel("Distance (m)")
pylab.title("Spring Experiment, k="+str(1/m))
pylab.show()


We have the following output:



We can compute for k through this reasoning:
y=mx+c
Distance = m * Force + c
Force =  1/m * Distance + c
k = 1/m

So is the k value correctly shown? If we look at the picture, we'll realize that the line is not a very good fit. Is it better to use a cubic fit to model our data? Let's look at the example below:

import pylab

forcesArray = pylab.array(masses)*9.81
distancesArray = pylab.array(distances)

m,c = pylab.polyfit(forcesArray,distancesArray,1)
distancesFitArray = m*forcesArray+c
a,b,c,d = pylab.polyfit(forcesArray,distancesArray,3)
distancesFitCubicArray = a*forcesArray**3 + b*forcesArray**2 + c*forcesArray**1 + d

pylab.plot(forcesArray,distancesArray,"o")
pylab.plot(forcesArray,distancesFitArray)
pylab.plot(forcesArray,distancesFitCubicArray)
pylab.legend(["Observed","Linear Fit","Cubic Fit"])
pylab.xlabel("Force (N)")
pylab.ylabel("Distance (m)")
pylab.title("Spring Experiment, k="+str(1/m))
pylab.show()


It seems as though the Cubic Fit is much better than the Linear Fit in describing the spring?



The useful thing about a model is its ability to predict data. Let's look at how the Cubic Fit would predict our data. We want to see, for example, what would happen if an approximately 20N force is placed on the spring. We do this by adding another point in the Forces array:

import pylab

forcesArray = pylab.array(masses)*9.81
forcesArrayExtended = pylab.array(masses+[2])*9.81
distancesArray = pylab.array(distances)

m,c = pylab.polyfit(forcesArray,distancesArray,1)
distancesFitArray = m*forcesArray+c
a,b,c,d = pylab.polyfit(forcesArray,distancesArray,3)
distancesFitCubicArray = a*forcesArrayExtended**3 + b*forcesArrayExtended**2 + c*forcesArrayExtended**1 + d

pylab.plot(forcesArray,distancesArray,"o")
pylab.plot(forcesArray,distancesFitArray)
pylab.plot(forcesArrayExtended,distancesFitCubicArray)
pylab.legend(["Observed","Linear Fit","Cubic Fit"])
pylab.xlabel("Force (N)")
pylab.ylabel("Distance (m)")
pylab.title("Spring Experiment, k="+str(1/m))
pylab.show()


Now observe the output:



The Cubic Fit seems to suggest that if we place 20N of force on the spring, the spring would retract back higher than it originally was. That happens maybe in another universe with different physics but that doesn't happen on Earth.

Remember that Hooke's Law is a first order linear approximation that will fail as the spring approaches its limit of elasticity. Just because the data is not plotted into a straight line does not mean that we should attempt to fit an arbitrary curve onto it, because any polynomial with a high enough degree can fit into any data. What we want, instead, is to look at the portions where Hooke's Law remains valid - that is, when the spring hasn't hit its limit of elasticity.

Let us repeat the experiment, discarding the last 10 results:

import pylab

forcesArray = pylab.array(masses)*9.81
forcesArrayLimited = pylab.array(masses[:-10])*9.81
#forcesArrayExtended = pylab.array(masses+[2])*9.81
distancesArray = pylab.array(distances)
distancesArrayLimited = pylab.array(distances[:-10])

m,c = pylab.polyfit(forcesArrayLimited,distancesArrayLimited,1)
distancesFitArray = m*forcesArray+c
#a,b,c,d = pylab.polyfit(forcesArray,distancesArray,3)
#distancesFitCubicArray = a*forcesArrayExtended**3 + b*forcesArrayExtended**2 + c*forcesArrayExtended**1 + d

pylab.plot(forcesArray,distancesArray,"o")
pylab.plot(forcesArray,distancesFitArray)
#pylab.plot(forcesArray,distancesFitCubicArray)
pylab.legend(["Observed","Linear Fit","Cubic Fit"])
pylab.xlabel("Force (N)")
pylab.ylabel("Distance (m)")
pylab.title("Spring Experiment, k="+str(1/m))
pylab.show()


Now notice that we've obtained a much more believable fit.



Now notice that we've obtained a better approximation for the k value of the spring, which is originally rated at 49.05 by the manufacturer.

To know how good a fit is to a particular set of results, we use the Coefficient of Determination. The Coefficient of Determination is:
r2=1-(Estimated Error / Variance of Measured Data).

Computationally, it is expressed as such:

def rSquare(predicted,measured):
    estimatedError = ((predicted-measured)**2).sum()
    measuredMean = measured.sum()/float(len(measured))
    measuredVariance = (measuredMean-measured**2).sum()
    return 1-estimatedError/measuredVariance


Thanks to Numpy's array, we performed array operations in one line each. Do you see how useful arrays are!?

We can compare the goodness of each fit by comparing it to the measured data. The closer it is to 1, the better. A CoD of 1 means that the fit maps directly onto the measured data.

No comments :

Post a Comment

<