tag:blogger.com,1999:blog-68908263546545307312017-06-24T16:03:32.769+08:00Basic DraftKelvinhttp://www.blogger.com/profile/08530586088329644767noreply@blogger.comBlogger294125tag:blogger.com,1999:blog-6890826354654530731.post-16784851785303441782015-05-15T19:30:00.001+08:002015-05-16T11:31:07.774+08:00Microeconomics 05From the previous discussion, we can easily see that if unconstrained, consumers will simply buy an infinite amount of goods (by choosing the highest indifference curve).<br /><br />However, due to a limited budget, we will have to choose the best combination that will maximize our budget and utility. This is called Constraint Maximization.<br /><br /><b>Budget Constraints</b> <br /><br />Typically, we can represent a constraint as:<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="http://4.bp.blogspot.com/-HwlfpN2JaM0/VVXW3XQyXKI/AAAAAAAAAQk/n9W6iRCpqjk/s1600/Budget.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="75" src="http://4.bp.blogspot.com/-HwlfpN2JaM0/VVXW3XQyXKI/AAAAAAAAAQk/n9W6iRCpqjk/s400/Budget.png" width="400" /></a></div><br />Where P is the number of pizzas, and Pp is the price of each pizza.<br /><br />Suppose that a pizza costs $20, and a movie costs $10, and we have $100 to spend, then we will have a graph as such:<br /><div class="separator" style="clear: both; text-align: center;"><br /></div><div class="separator" style="clear: both; text-align: center;"><a href="http://2.bp.blogspot.com/-ztk8KOL_QoI/VVXXJvYBn5I/AAAAAAAAAQ0/5gvWfx5MERY/s1600/Budget%2BLine.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="301" src="http://2.bp.blogspot.com/-ztk8KOL_QoI/VVXXJvYBn5I/AAAAAAAAAQ0/5gvWfx5MERY/s400/Budget%2BLine.png" width="400" /></a></div><br />We will be drawing $100 = $20P + $10M. The slope of this line is -Pm/Pp.<br /><br />The price ratio is -0.5, which is the Marginal Rate of Transformation (MRT). It is the rate at which we can transform pizzas into movies.<br /><br />The <b>Opportunity Cost</b> of something is the value of foregone alternative. For example, if we forgo a pizza, then we are forgoing 2 movies. Or from the other perspective, a movie costs half a pizza. This is the Opportunity Cost of the situation. Opportunity costs only make sense if we have a limited budget. <br /><br />Suppose that the price of pizza rose to $30. We will have the following graph now:<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="http://4.bp.blogspot.com/-McnYP9XHApc/VVXXYSslwBI/AAAAAAAAAQ8/mxolLZnX298/s1600/Budget%2BLine%2B2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="311" src="http://4.bp.blogspot.com/-McnYP9XHApc/VVXXYSslwBI/AAAAAAAAAQ8/mxolLZnX298/s400/Budget%2BLine%2B2.png" width="400" /></a></div><br />With the $100, we have things to choose from now. In other words, our <b>Opportunity Set</b> has become more restricted.<br /><br />We can also have a restricted Opportunity Set if our budget drops. Keeping the price constant as the first example, we will have the following graph if we drop our budget to $80. In this case, the slope remains the same.<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="http://1.bp.blogspot.com/-oCOQpTEMTCo/VVXXiO9AO2I/AAAAAAAAARE/gPRU_ocPSSQ/s1600/Budget%2BLine%2B3.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="293" src="http://1.bp.blogspot.com/-oCOQpTEMTCo/VVXXiO9AO2I/AAAAAAAAARE/gPRU_ocPSSQ/s400/Budget%2BLine%2B3.png" width="400" /></a></div><br />We want to find out what is the furthest indifference curve given a budget constraint. An example when we combine the indifference curves and the budget constraint looks like this.<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="http://2.bp.blogspot.com/-G1YzFW1Wzro/VVXXs6oIAwI/AAAAAAAAARM/-L9s-yJW0SA/s1600/Combined%2BIndifference%2Band%2BBudget.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="352" src="http://2.bp.blogspot.com/-G1YzFW1Wzro/VVXXs6oIAwI/AAAAAAAAARM/-L9s-yJW0SA/s640/Combined%2BIndifference%2Band%2BBudget.png" width="640" /></a> (Note that the x-axis represents movies, while y-axis represents pizza).</div><br />We can afford anything under our budget line. Notice that there are three points on our budget line itself. Points A and C are equivalent, but because we want the farthest indifference line (due to Non-Satiation), point D is actually the best we can get.<br /><br />The best indifference curve we can get is one where the budget constraint line is tangent to the curve. In order to perform utility maximization, we need to get these things to be tangent by ensuring that the MRS is equal to the MRT. By setting MRS equal to MRT, we are setting marginal benefits equal to marginal costs.<br /><br /><b>Deriving Marginal Utilities, MRS and MRT</b><br /><br />The marginal at each point is the partial derivative of the functional utility with respect to the item we are interested in. For example, if we are interested in the marginal utility function of pizza then we differentiate sqrt(PM) with respect to P.<br /><br />Points A and C are both undesirable. It is clear that the MRT for the entire budget line is -0.5. Let's work out the MRS for all the points:<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="http://2.bp.blogspot.com/-4w5uJEa9zJo/VVa3r-FpAiI/AAAAAAAAARg/9g9o-sigg6M/s1600/MRS%2BCalculation.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="258" src="http://2.bp.blogspot.com/-4w5uJEa9zJo/VVa3r-FpAiI/AAAAAAAAARg/9g9o-sigg6M/s400/MRS%2BCalculation.png" width="400" /></a></div><br />An MRS of -2 means that we are willing to trade 2 pizzas for a movie. However, the market MRT of -0.5 only requires us to trade half a pizza for a movie.<br /><br />There are times where we will end up with negative quantities when we solve for the optimum point. It may be that we are looking at a problem with a corner solution.<br /><br />Remember that MRS is the rate the consumer is willing to trade the y-axis for the x-axis. On the other hand, MRT is the actual market value ratio between the y-axis and the x-axis.<br /><br />Note that B, D and E both satisfies that MRT equals MRS, but we want the farthest indifference point that is still within our budget. That is the basic goal of utility maximization. Kelvinhttp://www.blogger.com/profile/08530586088329644767noreply@blogger.com0tag:blogger.com,1999:blog-6890826354654530731.post-10750375272413544532015-05-15T16:14:00.002+08:002015-05-15T16:14:55.504+08:00Microeconomics 04<b>Consumer Theory</b> <br /><br />In the previous lectures, we simply made use of given supply and demand curves. Now we are going to start looking at how the curves are made. We shall start by focusing on the demand curves in the study of consumer theory.<br /><br />All consumer-related behavior in economics revolves around utility maximization. In order to do these, we need to take into account consumer preferences, budget constraints, and perform constraint optimization. In the most basic sense, we will look at the tradeoff between two goods and how the consumer will choose between them.<br /><br />In summary, we will need to:<br />1) Make assumptions about preferences<br />2) Translate these assumptions into utility functions<br />3) Add budget constraints and perform maximization<br /><br />In order to model preferences across goods, we will need to impose the following assumptions to make our modeling simpler:<br />1) <i>Completeness</i> - When we compare two bundles of goods, we would prefer one or the other but never equally. In other words, you can always make a choice between them.<br />2) <i>Transitivity</i> - If we prefer x to y, and y to z, then we will prefer x to z.<br />3) <i>Non-Satiation </i>- More is always better and we will never be satisfied. We will never turn down having more.<br /><br /><b>Indifference Curves</b><br /><br />These curves are somewhat like preference maps. These are the graphical representation of people's preferences.<br /><br />Suppose that your parents gave you some money and you want to decide between pizza or movies. We can have the following choices.<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="http://1.bp.blogspot.com/-fM68Y-NKlnE/VVWpXRBQrSI/AAAAAAAAAPs/JMZfDOA7GrA/s1600/Pizzas%2Band%2BMovies%2B1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="282" src="http://1.bp.blogspot.com/-fM68Y-NKlnE/VVWpXRBQrSI/AAAAAAAAAPs/JMZfDOA7GrA/s400/Pizzas%2Band%2BMovies%2B1.png" width="400" /></a></div><br />Assume that we are indifferent between two pizzas and one movie, and one pizza and two movies. Clearly we would prefer two pizzas and two movies better than either of them. An indifference curve is a curve which shows all combinations of assumptions that the consumer is indifferent. Example curves based on the above statements can be shown as such:<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="http://3.bp.blogspot.com/-GiGaFOtG74s/VVWplbX9UbI/AAAAAAAAAP0/dE5xc9L3iR8/s1600/Pizzas%2Band%2BMovies%2Bwith%2BCurves.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="302" src="http://3.bp.blogspot.com/-GiGaFOtG74s/VVWplbX9UbI/AAAAAAAAAP0/dE5xc9L3iR8/s400/Pizzas%2Band%2BMovies%2Bwith%2BCurves.png" width="400" /></a></div><br />From the assumptions of consumers above, we have the following properties of indifference curves:<br />1) Due to the non-satiation assumption, consumers will prefer higher indifference curves.<br />2) Due to non-satiation, we can also say that indifference curves are always downward sloping. If it is upward sloping, then there can be a case where we are indifferent between 2 pizza and 2 movies and 1 pizza and 2 movies.<br />3) Indifference curves cannot cross. If the curves cross, there would be a case where we are indifferent between two choices even though one of it has more, which violates non-satiation.<br />4) Completeness also implies that we cannot have more than one indifference curves through a point.<br /><br /><b>Utility</b><br /><br />Utility functions are just a mathematical representation of the preference maps. We simply need to maximize the utility function to see what the user would choose.<br /><br />An example utility function can be U = sqrt(M*P). This is something that is empirical that we come up with and this is not the only utility function we can come up with. However, this is consistent with the indifference maps. <br /><br />Marginal utility refers to the amount utility changes with each change of the unit. In other words, it is the (partial) derivative of the utility function.<br /><br />An example of a diminishing marginal utility is shown below:<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="http://4.bp.blogspot.com/-NH93CXkK6AU/VVWqF3vkv_I/AAAAAAAAAP8/m4G6JJAssmg/s1600/Marginal%2BUtility.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="295" src="http://4.bp.blogspot.com/-NH93CXkK6AU/VVWqF3vkv_I/AAAAAAAAAP8/m4G6JJAssmg/s400/Marginal%2BUtility.png" width="400" /></a></div><br />We can see that each additional movie improves the utility, but it increases at a diminishing rate. Marginal utility would usually be diminishing, but at different rates for different goods. However, marginal utility will always be positive due to non-satiation.<br /><br />The Delta Marginal Utility graph can show us the actual contributions of each pizza.<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="http://2.bp.blogspot.com/-8i2O5gWL8sU/VVWqd9rzqcI/AAAAAAAAAQE/2VAf_yAgauY/s1600/Diminishing%2BDelta%2BUtility.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="318" src="http://2.bp.blogspot.com/-8i2O5gWL8sU/VVWqd9rzqcI/AAAAAAAAAQE/2VAf_yAgauY/s400/Diminishing%2BDelta%2BUtility.png" width="400" /></a></div><br /><b>Marginal Rate of Substitution</b> <br /><br />The MRS links the utility to the preference map. The MRS is the slope of the indifference curve, which is dP/dM. It is the rate at which we are willing to trade off the y-axis for the x-axis. For example, it is how many pizzas we are willing to trade off to get another movie. It purely comes from the preferences.<br /><br />We will refer to the following diagram and attempt to compute the MRS for each segment.<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="http://2.bp.blogspot.com/-kr_VlqyP9Bg/VVWqom9OZbI/AAAAAAAAAQM/LivXjgCzWE0/s1600/MRS%2B1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="316" src="http://2.bp.blogspot.com/-kr_VlqyP9Bg/VVWqom9OZbI/AAAAAAAAAQM/LivXjgCzWE0/s400/MRS%2B1.png" width="400" /></a></div><br />From the first segment to the second, we have an MRS of -2, because we are willing to give up 2 movies for 1 pizzas. However, the MRS of the second to the third segment is -0.5. This is because when we have 4 pizzas, the last pizza has a very low marginal utility. However its, marginal utility is increasing the lower number of pizzas we have.<br /><br />In general, we have:<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="http://2.bp.blogspot.com/-S1wTBciONi8/VVWq8dcov0I/AAAAAAAAAQU/p8pCUMIRqcw/s1600/MRS%2B2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="208" src="http://2.bp.blogspot.com/-S1wTBciONi8/VVWq8dcov0I/AAAAAAAAAQU/p8pCUMIRqcw/s400/MRS%2B2.png" width="400" /></a></div><br />Marginal utility is a negative function of quantity. The lower quantity we have, the higher the marginal utility, which is why we flip the X and Y axis.Kelvinhttp://www.blogger.com/profile/08530586088329644767noreply@blogger.com0tag:blogger.com,1999:blog-6890826354654530731.post-39135485269272328172015-05-14T23:21:00.000+08:002015-05-14T23:21:14.716+08:00Microeconomics 03<b>Elasticity</b> <br /><br />The magnitude of change we can expect from a supply or demand shock is determined by shape of the two curves.<br /><br />The elasticity of the curves can determine how much the quantity or price changes upon a shock.<br /><br />A perfectly inelastic demand curve is shown below.<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="http://2.bp.blogspot.com/-GXgXBixRaPo/VVS8Cw4kMcI/AAAAAAAAAO4/zvkOlmeBSGQ/s1600/Inelastic.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="388" src="http://2.bp.blogspot.com/-GXgXBixRaPo/VVS8Cw4kMcI/AAAAAAAAAO4/zvkOlmeBSGQ/s640/Inelastic.png" width="640" /></a></div><br />When there are no substitutes for a particular product, for example body parts for a transplant, then the demand curve for that product will be inelastic. When there is a supply shock, the quantity doesn't change - only the price changes.<br /><br />On the other hand, we have a perfectly elastic demand curve below.<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="http://3.bp.blogspot.com/-ujpSWBiXND4/VVS8XSGlRMI/AAAAAAAAAPA/gX1u2pk_68s/s1600/Elastic.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="430" src="http://3.bp.blogspot.com/-ujpSWBiXND4/VVS8XSGlRMI/AAAAAAAAAPA/gX1u2pk_68s/s640/Elastic.png" width="640" /></a></div><br />A very substitutable product will be elastic. A possible example of an elastic good is toilet paper, where we'll simply switch to another brand if a particular brand becomes expensive. Once the price increases, the quantity sold will drop drastically.<br /><br />Elasticity can be expressed as:<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="http://3.bp.blogspot.com/-Gn6VLMas_vI/VVS8i7M81WI/AAAAAAAAAPI/VKvstYbymOg/s1600/Elasticity%2BFormula.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="128" src="http://3.bp.blogspot.com/-Gn6VLMas_vI/VVS8i7M81WI/AAAAAAAAAPI/VKvstYbymOg/s320/Elasticity%2BFormula.png" width="320" /></a></div><br />If the quantity falls by 2% for every 1% increase in price, then we have E = -2. Therefore, an inelastic good will have E = 0, and a perfectly elastic good will have E = -infinity. E will typically be between 0 and -infinity.<br /><br />Suppose now that a producer sells Q goods at price P, they will make revenue R = PQ. The change in revenue with respect to price, would be:<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="http://1.bp.blogspot.com/-jJJt9YDiUj0/VVS8tuIQ2UI/AAAAAAAAAPQ/TytMl5rz8IA/s1600/Revenue%2BFormula.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="242" src="http://1.bp.blogspot.com/-jJJt9YDiUj0/VVS8tuIQ2UI/AAAAAAAAAPQ/TytMl5rz8IA/s400/Revenue%2BFormula.png" width="400" /></a></div><br />From this formula, if we are a producer, we should only raise prices when E is between 0 and -1. <br /><br /><b>Estimating Elasticities</b><br /><br />Theoretical economics can show us the direction of change, but empirical economics is what we use to show us the absolute values of change. However, the most difficult part of empirical economics is causation vs correlation.<br /><br />Take the example in the previous article, we saw that the price of pork rose when the price of beef rose.<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="http://3.bp.blogspot.com/-eycbLnlLE7I/VVIiCLnztlI/AAAAAAAAANo/ldlD-8Oa_2A/s1600/Supply%2Band%2BDemand%2BCurve%2BHigher%2BDemand.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em; text-align: center;"><img border="0" height="320" src="http://3.bp.blogspot.com/-eycbLnlLE7I/VVIiCLnztlI/AAAAAAAAANo/ldlD-8Oa_2A/s320/Supply%2Band%2BDemand%2BCurve%2BHigher%2BDemand.png" width="320" /></a></div><br /><br />If we wanted to calculate E here, we might try to plug in the numbers in the formulae. Since both the price and quantity increased in this case, we may say that we have a positive elasticity - higher price led to higher quantity. However, this is one of the most dangerous mistakes we can make. In this case, the increase in demand is actually what's causing the price to increase, and not the other round. We can only use this formula to find the amount of change in quantity per percentage change in price. Since quantity is driving the price in this case, and not the other round, we cannot apply the elasticity formula.<br /><br />What we are measuring here is actually the elasticity of the supply. <br /><br />However, when we change the supply as in the following diagram, we can actually get the correct answer.<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="http://2.bp.blogspot.com/-vDvvJPy8kN0/VVIjwdZv9lI/AAAAAAAAAN0/KbOSUZdReMQ/s1600/Supply%2Band%2BDemand%2BCurve%2BLower%2BSupply.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="320" src="http://2.bp.blogspot.com/-vDvvJPy8kN0/VVIjwdZv9lI/AAAAAAAAAN0/KbOSUZdReMQ/s320/Supply%2Band%2BDemand%2BCurve%2BLower%2BSupply.png" width="320" /></a></div><br />What we want is to be able to measure the slope of the demand curve. By shifting the supply curve, keeping the demand curve unchanged, we can measure the elasticity. <br /><br /><b>Taxation Example</b><br /><br />Suppose that we have vendors selling 100 million units of a particular type of product for $10 each. The government comes along and imposes a tax on the vendors, asking for $1 per unit. This increase in price effectively shifts the supply curve up. In order to make the same amount, the producers would have to charge $11 for each unit. However, this causes the equilibrium to drop to 97 million units, at $10.5 per unit.<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="http://1.bp.blogspot.com/-9HZ_juOD6OA/VVS9U8Z0bHI/AAAAAAAAAPY/6mv6mOiZQyQ/s1600/Taxation%2BExample.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="341" src="http://1.bp.blogspot.com/-9HZ_juOD6OA/VVS9U8Z0bHI/AAAAAAAAAPY/6mv6mOiZQyQ/s640/Taxation%2BExample.png" width="640" /></a></div><br /><br />In this case, we can clearly see that the equilibrium points have traveled on the demand curve, so we can calculate the elasticity using the formula.<br /><br />Since we see that the elasticity is -0.6, this means that we can continue raising prices further to increase profits.<br /><br />Of course, we are assuming elasticity is constant. However, since elasticity is a curve for straight demand lines, we are actually just measuring the LOCAL elasticity around that price change.<br /><br />It can be seen that the amount of money the government earns is the final equilibrium quantity multiplied by the amount of tax. This can be represented by the shaded region. We can clearly see that the amount of money the government makes depends on the elasticity of demand.<br /><br />If demand is perfectly inelastic, then the amount the government makes is simply Q * tax per unit. If the product demand is elastic, then the government will make a lot less money as the price needs to be fixed.Kelvinhttp://www.blogger.com/profile/08530586088329644767noreply@blogger.com0tag:blogger.com,1999:blog-6890826354654530731.post-22921001292756967092015-05-13T00:59:00.001+08:002015-05-13T01:08:14.320+08:00Microeconomics 02<b>Introduction to Supply and Demand</b><br /><br />The twin engines of economics are Supply and Demand. Demand refers to how much people wants something, and Supply refers to how much something is available. Adam Smith first came up with the Supply and Demand model, and when describing his model, he came up with the Water and Diamond paradox.<br /><br /><b>Water and Diamond Paradox</b> <br /><br />We can clearly see that water is much more essential to life than diamond, yet diamonds are worth so much more than water. The problem here is that even though there is great demand for water, the supply is much higher than the demand, hence its low price. On the other hand, even though there is much less demand for diamond than water, its supply is much lower than the demand, hence the high price.<br /><br /><b>Supply and Demand Equilibrium</b><br /><br />We can visualize supply and demand on a graph. An example of a supply and demand curve is shown below.<br /><div class="separator" style="clear: both; text-align: center;"><br /></div><div class="separator" style="clear: both; text-align: center;"><a href="http://3.bp.blogspot.com/-lIMV035tTco/VVIf95Eu70I/AAAAAAAAANY/5pyzNu0LqTw/s1600/Supply%2Band%2BDemand%2BCurve%2BBasic.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="320" src="http://3.bp.blogspot.com/-lIMV035tTco/VVIf95Eu70I/AAAAAAAAANY/5pyzNu0LqTw/s320/Supply%2Band%2BDemand%2BCurve%2BBasic.png" width="320" /></a></div><div class="separator" style="clear: both; text-align: center;">Generic Supply and Demand Graph</div><div class="separator" style="clear: both; text-align: center;"><br /></div><div class="separator" style="clear: both; text-align: left;">The blue line represents the demand curve. The demand curve represents the consumers' willingness to pay for a certain good. As the price increases, the quantity the consumer obtains is likely to decrease.</div><div class="separator" style="clear: both; text-align: left;"><br /></div><div class="separator" style="clear: both; text-align: left;">On the other hand, the red line represents the supply curve. The supply curve represents the willingness of a producer to supply a good. As the price rises, the producers would be more willing to produce more.</div><div class="separator" style="clear: both; text-align: left;"><br /></div><div class="separator" style="clear: both; text-align: left;">When the two lines meet, we have a Supply and Demand Equilibrium. This is where both the suppliers and consumers will be happy.</div><div class="separator" style="clear: both; text-align: left;"><br /></div><div class="separator" style="clear: both; text-align: left;"><b>Equilibrium Shift</b></div><div class="separator" style="clear: both; text-align: left;"><br /></div><div class="separator" style="clear: both; text-align: left;">Suppose that the graph represents the supply of chicken. Suppose that the supply of pork goes down due to a disease, leading to an increase in pork prices. Since pork and chicken are substitutes for each other, we can expect the demand for chicken to increase.</div><div class="separator" style="clear: both; text-align: left;"><br /></div><div class="separator" style="clear: both; text-align: center;"><a href="http://2.bp.blogspot.com/-eycbLnlLE7I/VVIiCLnztlI/AAAAAAAAANk/wMj6KaTYAqQ/s1600/Supply%2Band%2BDemand%2BCurve%2BHigher%2BDemand.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="320" src="http://2.bp.blogspot.com/-eycbLnlLE7I/VVIiCLnztlI/AAAAAAAAANk/wMj6KaTYAqQ/s320/Supply%2Band%2BDemand%2BCurve%2BHigher%2BDemand.png" width="320" /></a></div><div class="separator" style="clear: both; text-align: center;">Increase in Demand</div><div class="separator" style="clear: both; text-align: center;"><br /></div><div class="separator" style="clear: both; text-align: left;">As the demand increases, the demand curve shifts upwards (or outwards). With the supply being fixed, this would mean that suppliers can start charging more as consumers are now more willing to pay for it.</div><div class="separator" style="clear: both; text-align: left;"><br /></div><div class="separator" style="clear: both; text-align: left;">On the other hand, if the supply for pork decreases with the demand being fixed, we can also end up with a higher price as shown in the following graph.</div><div class="separator" style="clear: both; text-align: left;"><br /></div><div class="separator" style="clear: both; text-align: center;"><a href="http://2.bp.blogspot.com/-vDvvJPy8kN0/VVIjwdZv9lI/AAAAAAAAANw/jBNp06AzFjE/s1600/Supply%2Band%2BDemand%2BCurve%2BLower%2BSupply.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="320" src="http://2.bp.blogspot.com/-vDvvJPy8kN0/VVIjwdZv9lI/AAAAAAAAANw/jBNp06AzFjE/s320/Supply%2Band%2BDemand%2BCurve%2BLower%2BSupply.png" width="320" /></a></div><div class="separator" style="clear: both; text-align: center;">Decrease in Supply</div><div class="separator" style="clear: both; text-align: left;"><br /></div><div class="separator" style="clear: both; text-align: left;">Although prices increased in both cases, the quantity sold differs between them. In order to determine if it's a supply or demand shift, we need to be told the price, as well as the quantity.</div><div class="separator" style="clear: both; text-align: left;"><br /></div><div class="separator" style="clear: both; text-align: left;"><b>Constraints</b></div><div class="separator" style="clear: both; text-align: left;"><b><br /></b></div><div class="separator" style="clear: both; text-align: left;">In certain countries, there is the concept of minimum wage. We can analyze employment using the Supply and Demand model as well.</div><div class="separator" style="clear: both; text-align: left;"><br /></div><div class="separator" style="clear: both; text-align: center;"><a href="http://3.bp.blogspot.com/-WzMMaoqyo2M/VVIlEiLmZmI/AAAAAAAAAN8/NKN4yxf-J-M/s1600/Supply%2Band%2BDemand%2BConstraint.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="320" src="http://3.bp.blogspot.com/-WzMMaoqyo2M/VVIlEiLmZmI/AAAAAAAAAN8/NKN4yxf-J-M/s320/Supply%2Band%2BDemand%2BConstraint.png" width="320" /></a></div><div class="separator" style="clear: both; text-align: center;">Constraint with Excess Supply</div><div class="separator" style="clear: both; text-align: center;"><br /></div><div class="separator" style="clear: both; text-align: left;">In this case, the suppliers would the citizens (as they provide man-hours), while the consumers are the firms. In a country without minimum wage, we would expect the equilibrium to follow the supply and demand curves.</div><div class="separator" style="clear: both; text-align: left;"><br /></div><div class="separator" style="clear: both; text-align: left;">However, suppose that we add the constraint of minimum wage. When there is minimum wage, we would be in a state of disequilibrium. Due to the minimum wage, workers would be more willing to work. However, due to the higher costs, firms will be less willing to hire. This would lead to an excess in supply i.e. unemployment.</div><div class="separator" style="clear: both; text-align: left;"><br /></div><div class="separator" style="clear: both; text-align: left;">Notice that the new equilibrium e2 is actually on the demand curve instead of the supply curve. This is because even though there are more workers willing to work, the firms are the ones deciding how many they want to hire.</div><div class="separator" style="clear: both; text-align: left;"><br /></div><div class="separator" style="clear: both; text-align: left;">If, however, suppose that the minimum wage is below the equilibrium wage. In this case, the wages will actually stabilize at the equilibrium e1 instead of at the minimum wage.</div><div class="separator" style="clear: both; text-align: left;"><br /></div><div class="separator" style="clear: both; text-align: left;">In the above example, we talked about excess supply. There can also be a case where we have excess demand. Suppose that we try to model the Supply and Demand curves for oil.</div><div class="separator" style="clear: both; text-align: left;"><br /></div><div class="separator" style="clear: both; text-align: center;"><a href="http://1.bp.blogspot.com/-XQo28eo1IUc/VVIqALf_p8I/AAAAAAAAAOM/IB4EzPTDZ0U/s1600/Supply%2Band%2BDemand%2BExcess%2BDemand.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="320" src="http://1.bp.blogspot.com/-XQo28eo1IUc/VVIqALf_p8I/AAAAAAAAAOM/IB4EzPTDZ0U/s320/Supply%2Band%2BDemand%2BExcess%2BDemand.png" width="320" /></a></div><div class="separator" style="clear: both; text-align: center;">Excess Demand</div><div class="separator" style="clear: both; text-align: left;"><br /></div><div class="separator" style="clear: both; text-align: left;">Initially, we are at the equilibrium point e1. Suppose that there is a worldwide shortage of oil. If the supply for oil decreases, prices are expected to increase. Shifting the supply curve upwards (to indicate the decrease of supply), we will end up at point e2. However, if the government decides to put a cap on the maximum price for gas (limiting it to the original price), the suppliers will now be unwilling to supply as much as before. We now end up with the equilibrium on the supply curve at e3.</div><div class="separator" style="clear: both; text-align: left;"><br /></div><div class="separator" style="clear: both; text-align: left;"><b>Market Efficiency</b></div><div class="separator" style="clear: both; text-align: left;"><br /></div><div class="separator" style="clear: both; text-align: left;">In general, equilibrium points always results in the highest efficiency in the market. Constraints, therefore, lead to inefficiencies. The efficiency loss in the wage scenario is that even though workers are willing to work at a lower wage, they are now unemployed because of the minimum wage. Another example is where suppliers would be willing to supply more gas (and consumers will be willing to purchase them) but the gas price cap results in the shortage of gas.</div><div class="separator" style="clear: both; text-align: left;"><br /></div><div class="separator" style="clear: both; text-align: left;">A perfectly competitive market refers to a market where producers offer their goods to a wide range of consumers who bid up the price until the highest bidder gets it. An example of a perfectly competitive market is eBay auctions.</div><div class="separator" style="clear: both; text-align: left;"><br /></div><div class="separator" style="clear: both; text-align: left;">Without constraints, i.e. in a perfectly competitive market, the mechanism used for allocation is price. When prices are allowed to swing unconstrained, we will end up at the natural equilibrium. Looking at the gas example, price is no longer the allocation mechanism due to the constraint which leads to gas shortage. People will have to queue up for gas no matter how much money they are willing to pay. When price is not used as the allocation mechanism, there will be more inefficient mechanisms like the queue mechanism.</div><div class="separator" style="clear: both; text-align: left;"><br /></div><div class="separator" style="clear: both; text-align: left;">There is always a trade-off between efficiency and equity. Even though the market is most efficient at the equilibrium, but it may not be completely fair. For example, without a minimum wage, there are people who could be exploited. When equity comes into play, things become very complicated. For example, in order to shift the supply curve downwards, the government can provide subsidies to the oil companies. However, this would mean that people will be taxed more heavily on the other end.</div><div class="separator" style="clear: both; text-align: left;"><br /></div><div class="separator" style="clear: both; text-align: left;"><b>Summary</b></div><div class="separator" style="clear: both; text-align: left;"><br /></div><div class="separator" style="clear: both; text-align: left;">In summary, the market is always going to attempt to reach equilibrium whenever they can. Whenever a constraint causes a disequilibrium, the component that is lacking (either supply or demand) will determine where the new equilibrium will be. Equilibrium points are points of highest market efficiency, and constraints, in general, lead to inefficiencies. However, inefficiency may not necessarily be bad as it leads to equity.</div>Kelvinhttp://www.blogger.com/profile/08530586088329644767noreply@blogger.com0tag:blogger.com,1999:blog-6890826354654530731.post-45273017962110481862015-05-12T23:34:00.000+08:002015-05-13T01:07:31.280+08:00Microeconomics 01The study of microeconomics involves constrained optimization in face of scarcity. We build models on how consumers and producers behave in order to study their relationships. However, unlike engineering models, these models are never precise, and can only capture the main insights and never the minute details.<br /><br />The main constraint faced by consumers is their limited budget, which we will optimize through utility maximization. We want to maximize their utility subject to a budget constraint. Firms on the other hand focus on maximizing profits subject to both consumer demands and input costs.<br /><br />In Microeconomics, prices play three fundamental roles:<br />1) What goods and services should be produced?<br />2) How do we produce these goods and services?<br />3) Who will make use of these goods and services?<br /><br />The above three question can be summarized into the term "allocation". As consumers and firms interact in the marketplace, these questions will lead to a set of prices which best satisfies both the consumers and producers.<br /><br />If we want to create a product, we need to analyze the consumer's reception towards it. Next, we will need to look at all the input costs to see if we will be able to set a price that allows us to profit, while making it affordable to the target market.<br /><br />When dealing with economics, we have to face both the theoretical and empirical sides. Theoretical economics involves building models to explain the world, while empirical economics involves testing those models to judge its accuracy. That would mean that, of course, models built in theoretical economics should be testable.<br /><br />Economics can also be positive or normative. Positive describes what the world are, and normative describes what the world should be.<br /><br />We can almost model any kind of decision. An example model for a decision-making process for buying a new vs used product can be:<br />1) What are your personal preferences?<br />2) Are you risk-averse or risk-loving?<br />3) Is your budget limited or unlimited?<br /><br />By comparing the price of the new and used product, we weigh that difference in price with the above questions to result in an optimization problem. <br /><br />Even though we do not really go through these thought processes, but we behave AS IF we have solved these optimization problems. This is known as Milton Friedman's "As If" principle.Kelvinhttp://www.blogger.com/profile/08530586088329644767noreply@blogger.com0tag:blogger.com,1999:blog-6890826354654530731.post-84600368802513596792013-07-24T22:53:00.001+08:002013-07-24T22:53:57.340+08:00Physics 01<u><b>Introduction to Dimensions</b></u><br /><br />Physics is concerned about the very small, and the very large. In order to represent our observations, we need to know about <b>units</b>. The <b>SI unit</b> for length is the metre (m), time - seconds (s) and mass - kilograms (kg). The <b>L</b> (length), <b>T</b> (time) and <b>M</b> (mass) are fundamental units which all other quantities in physics can be derived.<br /><br />[ <b>Speed </b>] = [ L ] / [ T]<br /><br />The dimension of <b>Speed</b>, is the dimension of <b>Length</b>, divided by the dimension of <b>Time</b>.<br /><br />[ <b>Volume </b>] = [ L ]<sup>3</sup><br /><br />The dimension of <b>Volume</b>, is the dimension of <b>Length </b>to the power of 3.<br /><br />[ <b>Density </b>] = [ M ] / [ Volume ] = [ M ] / [ L ]<sup>3</sup><br /><br />[ <b>Acceleration </b>] = [ L ] / [ T ]<sup>2</sup><br /><br />All other quantities can be derived from these 3 fundamentals.<br /><br /><u><b>Uncertainty</b></u><br /><br />Any measurement, without any knowledge of its <b>uncertainty</b>, is completely meaningless. The uncertainty is the known error margin. Take for example, two measurements taken are:<br /><br />25.0cm<br />24.5cm<br /><br />Whether it is true that the subject in the first measurement is longer than the second one, or is there a possibility that they might be equal, depends on the error margin or the uncertainty of a measurement. If the uncertainty is +-0.1, we can almost guarantee that the first subject is longer than the second. However, if the uncertainty is +-1cm, we may have to take more measurements.<br /><br />We can find the uncertainty of our measurements through control tests.<br /><br /><u><b>Galileo Proportions Problem</b></u><br /><br />When it comes to measurements in systems, we also need to look at how certain measurements scale with each other. Let's look at an example posed by Galileo Galilei.<br /><br />Suppose that we have an animal of size S. It has legs with a femur bone of length L. The femur bone has a thickness of D.<br /><br />It is safe to say that an animal has a femur bone that is <b>proportionate (</b><b>∝) </b>to its size:<br />S ∝ L<br /><br />In this case, it is also safe to say that the mass of an animal is proportional to its size to the power of 3:<br />M ∝ S<sup>3</sup><br /><br />Therefore:<br />M ∝ L<sup>3</sup><br /><br />Now notice that in order to support its weight, according to Galileo, its femur bone's cross section must be proportionate to the mass:<br />M ∝ D<sup>2</sup><br /><br />We can then derive this proportionality:<br />D ∝ S<sup>3/2</sup><br /><br />Then he raised a comparison between an elephant and a mouse. An elephant is about 100 times the size of the mouse, so its femur bone would be 100 times longer, which is true. This proportionality suggests that an elephant's femur cross section would also be 1000 times thicker than the mouse, which turned out to be false. The elephant's femur, is only 100 times thicker, which scientists concluded to be nature's protection against buckling.<br /><br /><u><b>Derivation of Dimensions</b></u><br /><br />Let us look at how we can derive the dimensions of units. Recall the following formula:<br /><br />F = ma<br /><br />[ <b>Force </b>] = [M] * [ Acceleration ]<br /><br />[ Acceleration ] = [ Velocity ] / [ T ]<br /><br />[Velocity ] = [ L ] / [ T ]<br /><br />Therefore:<br /><br />[ Acceleration ] = [ L ] / [ T ]<sup>2</sup><br /><br />[ Force ] = [ M ] [ L ] / [ T ]<sup>2</sup><br /><br />Now let's look at <b>Momentum</b>. Momentum is described as:<br /><br />p = mv<br /><br />[ Momentum ] = [ M ] * [ Velocity ]<br /><br />Therefore:<br /><br />[ Momentum ] = [ M ] * [ L ] / [ T ]<br /><br />How about <b>Pressure</b>? Pressure is described as Force per unit Area:<br /><br />[ Pressure ] = [ Force ] / [ Area ]<br /><br />[ Force ] = [ M ] [ L ] / [ T ]<sup>2</sup><br /><br />[ Area ] = [ L ]<sup>2</sup><br /><br />Therefore:<br /><br />[ Pressure ] = [ M ] / [ L ] [ T ]<sup>2</sup><br /><br />Finally, <b>Kinetic Energy</b> is defined as:<br /><br />1/2*mv<sup>2</sup><br /><br />This gives us:<br /><br />[ Kinetic Energy ] = 1/2 [ M ] [ L ]<sup>2</sup> / [ T ]<sup>2</sup><br /><br />From here, we can derive Work, which is the application of Force for a certain Distance:<br /><br />[ Work ] = [ M ] [ L ]<sup>2</sup> / [ T ]<sup>2</sup><br /><br />We can also derive Power, which is the supply of Energy (which is equal to Work) over Time:<br /><br />[ Power ] = [ M ] [ L ]<sup>2</sup> / [ T ]<sup>3</sup><br /><br /><u><b>Period of Pendulum Problem</b></u><br /><br />Let us now create a function that can tell us the period of a pendulum (time it takes for a full oscillation).<br /><br />Suppose that we have the following information:<br />l = Length of Pendulum (L)<br />m = Mass of Bob (M)<br />g = Gravitational Acceleration (L/T<sup>2</sup>)<br />θ = Angular Amplitude (L)<br />C = Constant<br /><br />Our final solution would be in the form:<br />[T] = C [L]<sup>p</sup> [M]<sup>q</sup> [L]<sup>r</sup> [LT<sup>-2</sup>]<sup>s</sup><br /><br />From here, we can conclude that q is zero because it doesn't appear on the left at all. We can also conclude that s must be -0.5 because T on the left is to the power of 1 only. Since L doesn't appear on the left, then L should be eliminated as well.<br /><br />The total power of L in this equation is currently p+r+s. In order for L to have the power of 0, we must have:<br />p+r+s = 0<br />p+r = -s<br />p+r = 0.5<br /><br />Now, in order to find out exactly what p and r is, we'll need to go through experimentation to find out if Angular Amplitude plays a part in the equation. If we mess around with a pendulum, you'll soon discover that the Angular Amplitude does not affect the oscillation at all. Therefore, p = 0.5, r = 0.<br /><br />Our final expression would be:<br />[T] = C [L]<sup>0.5</sup> [LT<sup>-2</sup>]<sup>-0.5</sup><br />T = C(l/g)<sup>0.5</sup> <br /><br />If we scale the length by x, the function would react as:<br />T(scaled) = C(xl/g)<sup>0.5</sup> <br />T(scaled) = (x)<sup>0.5</sup>C(xl/g)<sup>0.5</sup> <br />T(scaled) = (x)<sup>0.5</sup>T<br /><br />This is the furthest we can go without experimentation. Kelvinhttp://www.blogger.com/profile/08530586088329644767noreply@blogger.com0tag:blogger.com,1999:blog-6890826354654530731.post-21177891922884356362013-06-26T18:11:00.001+08:002013-06-26T19:24:00.376+08:00Computer Science 17<b>Noise</b>, in photography, is the inability of your camera's sensor to accurately sample and reproduce a pixel from a given exposure. However, the fact that your picture is still discernible is due to the high <b>Signal-to-Noise Ratio</b>.<br /><br />Signal-to-Noise Ratio is a measurement of how much of a given piece of information is correct, and how much of it is simply noise. For a typical picture, the SNR is actually extremely high, which means that most of the information is correct.<br /><br />Noise, like most physical phenomena, is <b>Normally Distributed</b>. In case of a picture, for each pixel, the accuracy of the color representation is normally distributed between 3 axes, the Red, Green and Blue, with the Mean of the distribution being the most accurate color representation of the pixel that a camera can produce. If the color inaccuracy exceeds a certain threshold, we will call it "Noise".<br /><br />If we measure the Noise of a single image, we measure the number of correctly produced pixels, versus the pixels that are incorrectly produced. We must, of course, define what Noise means in an image. In our example, we'll say that pixels of colors that are below 80% accuracy would be considered Noise. If for every 100 pixels there exists 1 pixel of Noise, we would have the SNR of 100:1. This also means that 1% of total pixels are actually Noise.<br /><br />In the context of an image, I'll refer to noise as <b>Image Noise</b>. Image Noise of an image is relatively stable. The percentage of pixels that are incorrectly produced for a given exposure is roughly constant and does not fluctuate greatly. When considering Image Noise, an exposure can be 100:1 SNR, the next exposure can be 105:1, and the following can be 95:1.<br /><br />However, if we look at the noise of each pixel, we measure how far is the color away from the actual. For a 100:1 SNR image, 99% of the pixels are actually sufficiently correctly reproduced, while 1% strayed too far from the Mean.<br /><br />In the context of individual pixels, I'll refer to the noise as <b>Pixel Noise</b>. Pixel Noise can fluctuate greatly. A pixel can have RGB SNR ranging from 100:1 , or 5:1, or even 1:1 accuracy. It is this fluctuation that causes Pixels to contribute to Image Noise.<br /><br />In order to fix this, we use <b>Image Averaging</b>. If we average every single pixel across 100 exposures of the same angle, we would have a pixel that is 99% accurate to its actual color. By averaging, no pixels would be 100% accurate this way, but pixels that are below 80% accuracy would become extremely rare. The overall Noise is unchanged, but by evening and spreading out the Pixel Noise across all pixels, we effectively reduced the Image Noise of the image.<br /><br />Let's go through an example of Image Averaging. You can do this with any series of images taken from the same angle. For our purpose, we'll use frames from a video. Here's the set of frames we have, downsized from 720p:<br /><br /><img src="http://img7.imageshack.us/img7/6964/5kr.gif" /><br /><br />This is the full image of a single frame. Click to open it in a separate page to further inspect it:<br /><br /><a href="http://imageshack.us/a/img841/7344/he10.jpg"><img src="http://imageshack.us/a/img841/7344/he10.jpg" /></a><br /><br />After averaging, this is what we got:<br /><br /><a href="http://imageshack.us/a/img838/6888/hjsj.jpg"><img src="http://imageshack.us/a/img838/6888/hjsj.jpg" /></a><br /><br />Not bad for a zoomed video at 60x huh?<br /><br />Let's look at another example, at 30x zoom with some contrasting colors:<br /><br /><img src="http://img69.imageshack.us/img69/7414/s1q.gif" /><br /><br />Notice that you can also have images where the lighting change slightly. It would all even out perfectly.<br /><br />(There's been some error with the image hosting. I'll host the images once the server is online)<br /><br />Here's a single shot in full resolution:<br /><br /><a href="http://imageshack.us/a/img40/2273/3ip7.jpg"><img src="http://imageshack.us/a/img40/2273/3ip7.jpg" /></a><br /><br />And here's the shot after averaging the pixels across 10 shots:<br /><br /><a href="http://imageshack.us/a/img843/181/dc39.jpg"><img src="http://imageshack.us/a/img843/181/dc39.jpg" /></a><br /><br />As you can see, the focal blur is better represented and so on.<br /><br />This concludes this article. I may write another article on how we can do the averaging, especially across hundreds of images.Kelvinhttp://www.blogger.com/profile/08530586088329644767noreply@blogger.com0tag:blogger.com,1999:blog-6890826354654530731.post-84005187298488037592013-06-20T19:59:00.001+08:002013-06-20T20:34:04.427+08:00Computer Science 16<b>Greedy Algorithms</b> makes up the answer through <b>Locally Optimal</b> solutions, and therefore is quicker at arriving at an answer. However, it does not guarantee that the final answer is <b>Globally Optimal</b>. The problem, of course, with finding the Globally Optimal is that it is prohibitively expensive to compute. Finding the Global Optimum is <b>Inherently Exponential</b>.<br /><br />Let us now look at another class of Optimization Problems. One of the most important aspect of Computer Science is the idea of <b>Machine Learning</b>. Superficially, we can define Machine Learning as building programs that learns. However, it suffices to say that almost all programs learn in some form or another.<br /><br />Machine Learning is a scientific discipline that is concerned with the design and development of algorithms that allow computers to evolve behaviors based on empirical data. We can have a better idea of Machine Learning by looking at the <a href="http://en.wikipedia.org/wiki/Machine_learning">Machine Learning Wikipedia page</a>.<br /><br />A major focus of Machine Learning is to allow a computer to learn and recognize complex patterns in order to make decisions based on empirical data. This whole process is known as <b>Inductive Inference</b>. The basic idea is for a computer to observe and record examples from a subset of a statistical phenomena, and generate a model that summarizes some statistical properties of the data in order to predict the future.<br /><br />There are two distinctive approaches to Machine Learning:<br />1) <b>Supervised Learning</b><br />2) <b>Unsupervised Learning</b><br /><br />In <b>Supervised Learning</b>, we associate a <b>Label</b> with each example in a <b>Training Set</b>. If the Label is <b>Discrete</b>, we call it a <b>Classification Problem</b>. For example, we can classify whether a certain transaction on a credit card is done by the owner or not. If the Label is <b>Real</b>, we call it a <b>Regression Problem</b>. When we were doing the Curve Fitting in the previous articles, we were doing Machine Learning, and handling a Regression Problem.<br /><br />Based on the examples in the Training Set, the goal is create a program in such a way that it is able to predict the answer for other cases before they are explicitly observed. In order to do this, we need to <b>Generalize</b> the statistical properties of the Training Set to be able to make predictions about things we haven't seen.<br /><br />Let us look at the following Training Set. Suppose that we've obtained some data plotted into axes X and Y, and the Colors (Red and Green) and Shapes (Grass and Leaf) are the Labels of the data.<br /><br /><img src="http://img43.imageshack.us/img43/1664/uhq.png" /><br /><br />Upon obtaining a set of data, we must ask ourselves a few questions:<br />1) Are the Labels accurate?<br />Labels are subject to machine and human errors. <br /><br />2) Is the past representative of the future?<br />If the past results are generally not representative of the future, then it would be extremely difficult to generate a sufficiently accurate model for use.<br /><br />3) Do we have enough data to generalize?<br />If the Training Set is relatively small, we should take care not have too much confidence on the predictions.<br /><br />4) What features are to be extracted?<br />In order to find out something about the data, we must have the relevant features of the data. For example, in order to find out the composition of haze particles, it is useful to know the types of trees that are being burnt down.<br /><br />5) How tight should the fit be?<br />How do we classify the data? Should we fit the data onto the curve, or fit the curve to the data?<br /><br />It is easy to tell from the Training Data that a certain line y=c separates the labelled Green data from the labelled Red data within the given boundaries.<br /><br /><img src="http://img585.imageshack.us/img585/4817/k7x.png" /><br /><br />However, the more difficult part is thinking how should we go about classifying the Shape labels? Here's two of the many possible ways we can classify it:<br /><br />Do we classify it by grouping the leaves together like this? <br /><br /><img src="http://imageshack.us/a/img4/4336/gdk.png" /><br /><br />This is great because it minimizes the Training Error in the data. However, is this what we really want? How do we know if future data would really fit like this? <br /><br />Or do we classify it like that, and assume that the leaf with the grass is a labelling error or an outlier? This is a good generalization but there is one Training Error.<br /><br /><img src="http://imageshack.us/a/img801/5039/eo3j.png" /><br /><br />That's where we need more data plotted in the vicinity of the stray leaf to find out if it's a Training Error or an Outlier.<br /><br />Now let's talk about <b>Unsupervised Learning</b>. The big difference here is that we have Training Data, but we don't have Labels. The program gets a bunch of points, but doesn't know what shape or color is it. What can the program learn?<br /><br />Typically, what the program learns in Unsupervised Learning is the <b>Regularities of Data</b>. From the initial data set, Unsupervised Learning can discover the structures in the data.<br /><br /><img src="http://img560.imageshack.us/img560/6978/2lx.png" /><br /><br />The dominant form of Unsupervised Learning is <b>Clustering</b>. What we just did above is to find the Clusters in the data. Clustering is to organize the data into groups whose members are similar to each other. However, how do we describe "similar"? Suppose that the data plots the heights and weights of people, and the x Axis represents weight, while the y Axis represents height. We now have distinct clusters of people.<br /><br />Walmart in fact used Clustering to find out how to optimally structure their shelves. They found out through Clustering that there is a strong correlation between people who bought beer and people who bought diapers. Therefore they decided to place their beer and diapers departments side by side, which worked surprisingly well.<br /><br />Similarly, Amazon used Clustering to find people who like similar books based on their profile and buying habits. Netflix uses the same to recommend movies. Biologists use clustering a lot to classify plants and animals. They also use it a lot in genetics to find genes that look and function alike.<br /><br />What properties do a good Clustering have? It should have:<br />1) <b>Low Intra-Cluster Dissimilarities</b><br />All the points in the same cluster should have as little dissimilarities as possible.<br /><br />2) <b>High Inter-Cluster Dissimilarities</b><br />All points between different clusters should be highly dissimilar. <br /><br />How do we measure the Intra- and Inter-Cluster Dissimilarities? Well, of course, if you thought of Variance, you are absolutely right!<br /><br /><b>Intra-Cluster Variance</b> looks at the Variance of the Intra-Cluster Points with respect to the Cluster's Mean.<br /><br />variance(C) = Sum of, from i=0 to i=len(C), (mean(C)-c<sub>i</sub>)<sup>2</sup> <br /><br /><b>Inter-Cluster Variance</b> looks at the Variance between the Means of the different Clusters.<br /><br />Clustering can be formulated as an Optimization Problem. The Optimization Problem of Clustering is: Find the set of clusters, C, such that Intra-Cluster Variance is minimized, and the Inter-Cluster Variance is maximized.<br /><br />However, such a definition is not enough, because if that's the case, then we can just put each point in its own cluster. The variance would be 0 and Intra-Cluster Variance would be maximized.<br /><br />What we stated is just the Objective Function. In order to have a proper Optimization Problem, we need to spell out the Constraints. The Constraints can be things like:<br />1) We can only have a maximum of x clusters<br />2) There should be a minimum distance between two clusters.<br /><br />Finding the absolute best solution to this problem is Computationally Prohibitive (it is minimum in the order of n<sup>2</sup>). Therefore people typically resort to some form of Greedy Algorithm. The two most common Greedy Algorithm used to solve Clustering is:<br />1) Hierarchical Clustering<br />2) k-means<br /><br />Let's go through an example in Hierarchical Clustering. Suppose that we have N items, and we have an accompanying N*M matrix, where M=N, which we can look-up to find out the distance between any two points.<br /><br />The algorithm is as follows:<br />1) Assign each item to its own cluster. Therefore, if we have N items, we have N clusters.<br />2) Find the most similar pair of clusters and merge them.<br />3) Continue the process until all items are in x number of clusters.<br /><br />This type of Clustering is also known as Agglomerative Clustering. Step 2 of the algorithm is the most difficult. However, what do we compare if there are two items in each cluster now? What we need to look at is the Linkage Criteria.<br /><br />There are various Linkage Criteria which we can look at. These are:<br />1) Single-Linkage<br />The shortest distance between any two pairs between clusters.<br /><br />2) Complete-Linkage<br />The shortest distance between two furthest pairs between clusters. This takes care of the worst case.<br /><br />3) Average-Linkage<br />The distance between the means of each cluster.<br /><br />Of course, like any Greedy Algorithm, there is no Linkage Criteria that's the best. This algorithm is minimum an Order of n<sup>2</sup> and there is no guarantee that the Cluster is Globally Optimal.<br /><br />The most important issue in Machine Learning is Feature Selection in order to know what to compare. To Cluster countries together, we need to have something known as a Feature Vector containing all the features that are to be compared. An example of a Feature Vector of a country is a list containing the Population, Coordinates, Size, and so on.<br /><br />Let's do an example of Hierarchical Clustering. Suppose that we have these coordinates of MRTs:<br /><br /><code>Dhoby Ghaut,1.298593,103.845909<br />Buona Vista,1.307412,103.789696<br />Paya Lebar,1.318089,103.892982<br />Mountbatten,1.306306,103.882531<br />Jurong East,1.334308,103.741958<br />Tampines,1.353092,103.945229<br />Tanah Merah,1.327257,103.946579<br />Bishan,1.350772,103.848183<br />Yio Chu Kang,1.381905,103.844818<br />Choa Chu Kang,1.385482,103.74424</code><br /><br />If we convert the Longitudinal and Latitudinal data into actual distances, we would end up with the following table:<br /><br /><table border="1"><tbody><tr><td>Location</td><td>Dhoby Ghaut</td><td>Paya Lebar</td><td>Tampines</td><td>Tanah Merah</td><td>Buona Vista</td><td>Yio Chu Kang</td><td>Bishan</td><td>Jurong East</td><td>Mountbatten</td><td>Choa Chu Kang</td></tr><tr><td>Dhoby Ghaut</td><td>0</td><td>5662</td><td>12590</td><td>11632</td><td>6324</td><td>9260</td><td>5804</td><td>12215</td><td>4159</td><td>14863</td></tr><tr><td>Paya Lebar</td><td>5662</td><td>0</td><td>6989</td><td>6043</td><td>11540</td><td>8885</td><td>6163</td><td>16880</td><td>1750</td><td>18148</td></tr><tr><td>Tampines</td><td>12590</td><td>6989</td><td>0</td><td>2875</td><td>18015</td><td>11609</td><td>10788</td><td>22686</td><td>8694</td><td>22625</td></tr><tr><td>Tanah Merah</td><td>11632</td><td>6043</td><td>2875</td><td>0</td><td>17574</td><td>12837</td><td>11243</td><td>22754</td><td>7489</td><td>23399</td></tr><tr><td>Buona Vista</td><td>6324</td><td>11540</td><td>18015</td><td>17574</td><td>0</td><td>10299</td><td>8091</td><td>6089</td><td>10318</td><td>10040</td></tr><tr><td>Yio Chu Kang</td><td>9260</td><td>8885</td><td>11609</td><td>12837</td><td>10299</td><td>0</td><td>3480</td><td>12596</td><td>9389</td><td>11185</td></tr><tr><td>Bishan</td><td>5804</td><td>6163</td><td>10788</td><td>11243</td><td>8091</td><td>3480</td><td>0</td><td>11946</td><td>6244</td><td>12179</td></tr><tr><td>Jurong East</td><td>12215</td><td>16880</td><td>22686</td><td>22754</td><td>6089</td><td>12596</td><td>11946</td><td>0</td><td>15929</td><td>5693</td></tr><tr><td>Mountbatten</td><td>4159</td><td>1750</td><td>8694</td><td>7489</td><td>10318</td><td>9389</td><td>6244</td><td>15929</td><td>0</td><td>17709</td></tr><tr><td>Choa Chu Kang</td><td>14863</td><td>18148</td><td>22625</td><td>23399</td><td>10040</td><td>11185</td><td>12179</td><td>5693</td><td>17709</td><td>0</td></tr></tbody></table><br />Let's go through this semantically, using the Single-Linkage example:<br /><br /><code>10 Clusters<br />Dhoby Ghaut, Paya Lebar, Tampines, Tanah Merah, Buona Vista, Yio Chu Kang, Bishan, Jurong East, Mountbatten, Choa Chu Kang<br /><br />9 Clusters<br />Dhoby Ghaut, [Paya Lebar, Mountbatten], Tampines, Tanah Merah, Buona Vista, Yio Chu Kang, Bishan, Jurong East, Choa Chu Kang<br /><br />8 Clusters<br />Dhoby Ghaut, [Paya Lebar, Mountbatten], [Tampines, Tanah Merah], Buona Vista, Yio Chu Kang, Bishan, Jurong East, Choa Chu Kang<br /><br />7 Clusters<br />Dhoby Ghaut, [Paya Lebar, Mountbatten], [Tampines, Tanah Merah], Buona Vista, [Yio Chu Kang, Bishan], Jurong East, Choa Chu Kang<br /><br />6 Clusters<br />[Dhoby Ghaut, Paya Lebar, Mountbatten], [Tampines, Tanah Merah], Buona Vista, [Yio Chu Kang, Bishan], Jurong East, Choa Chu Kang<br /><br />5 Clusters<br />[Dhoby Ghaut, Paya Lebar, Mountbatten], [Tampines, Tanah Merah], Buona Vista, [Yio Chu Kang, Bishan], [Jurong East, Choa Chu Kang]<br /><br />4 Clusters<br />[Dhoby Ghaut, Paya Lebar, Mountbatten], [Tampines, Tanah Merah], [Yio Chu Kang, Bishan], [Jurong East, Choa Chu Kang, Buona Vista]<br /><br />3 Clusters<br />[Dhoby Ghaut, Paya Lebar, Mountbatten, Yio Chu Kang, Bishan], [Tampines, Tanah Merah], [Jurong East, Choa Chu Kang, Buona Vista]<br /><br />2 Clusters<br />[Dhoby Ghaut, Paya Lebar, Mountbatten, Yio Chu Kang, Bishan, Jurong East, Choa Chu Kang, Buona Vista], [Tampines, Tanah Merah]<br /><br />1 Cluster<br />[Dhoby Ghaut, Paya Lebar, Mountbatten, Yio Chu Kang, Bishan, Jurong East, Choa Chu Kang, Buona Vista, Tampines, Tanah Merah]</code> <br /><br />As you can tell, at the point where there's 4 Clusters left, it's already looking pretty good. In the next article we will go through the actual implementation of this algorithm!Kelvinhttp://www.blogger.com/profile/08530586088329644767noreply@blogger.com0tag:blogger.com,1999:blog-6890826354654530731.post-86605783995064410802013-06-19T23:52:00.002+08:002013-06-20T00:11:51.833+08:00Computer Science 15There are many ways we can create models to solve problems. The generalized process is as shown:<br />1) Start with an experiment in which we gather data.<br />2) Use computation to find and evaluate a model.<br />3) Use theory, analysis and computation to derive a consequence of the model.<br /><br /><b>Optimization Problems</b> involve writing programs to optimize real-life scenarios. Each Optimization Problem consists of an <b>Objective Function</b> and a set of <b>Constraints</b> that has to be satisfied. The Objective Function is used to compare and find the optimal results, while Constraints are rules that must be obeyed (for example, in the case of navigation, Constraints can be in the form of limiting to only pedestrian routes).<br /><br />Once we come up with these things, we can use computation to attack the problem. A way to solve seemingly new problems is to attempt to map these problems to classic problems that we already know how to solve. This is known as <b>Problem Reduction</b>.<br /><br />Optimization Problems typically take a long time to solve. Often, there is no computationally efficient way to solve them. Therefore, it is common that we see "Best Effort" results.<br /><br />A classic Optimization Problem is the <b>Knapsack Problem</b>. The problem is typically discussed in the context of a burglar. One of the problems a burglar must deal with is deciding what to steal. There is usually far more to steal than you can carry away. The Objective Function must optimize the value of what we steal, with the Constraint of the maximum weight or volume we can carry in the knapsack.<br /><br />Suppose that the burglar has done his research on the house and finds out that these are the objects of interest:<br /><br /><table border="1"><tbody><tr><th>Item</th><th>Value</th><th>Weight</th></tr><tr><td>Desktop</td><td>$1500</td><td>10Kg</td></tr><tr><td>Laptop</td><td>$1200</td><td>1.5Kg</td></tr><tr><td>LED TV</td><td>$1000</td><td>10Kg</td></tr><tr><td>Smartphone</td><td>$750</td><td>0.25Kg</td></tr><tr><td>Ring</td><td>$1000</td><td>0.01Kg</td></tr><tr><td>Piano</td><td>$6000</td><td>180Kg</td></tr><tr><td>HiFi Set</td><td>$500</td><td>5Kg</td></tr><tr><td>PSP</td><td>$300</td><td>0.5Kg</td></tr></tbody></table><br />(Here I'm saying Weight in a general sense, don't tell me it should be Mass instead. I know :p)<br /><br />One of the easiest solution is to implement a <b>Greedy Algorithm</b>. A Greedy Algorithm is iterative, and at each step, the locally optimal solution is picked. What we need to know is to find out what is considered "locally optimal". It can be the best price/weight ratio, or the highest raw value that can still fit into the bag, or the item with the least weight. There is no guarantee that each variable is the best. Therefore, the biggest problem with the Greedy Algorithm is that we have no single best "locally optimal" that would work most of the time for the <b>0/1 Knapsack Problem</b>.<br /><br />The term 0/1 refers to the fact that we can't take half an item. It's either the full item, or we don't take it. Greedy Algorithm, however, is the best for <b>Continuous Knapsack Problem</b>. Take for example, if we break into a smith shop and find barrels of gold, silver and other precious metals. We can then work by filling up with gold first, then silver, then the other metals, and when we finally run out of space, we can take a partial barrel. However, most problems in life are 0/1 Knapsack problems.<br /><br />Let's create a quick object that allows us to define what an item is:<br /><br /><code>class Item(Object):<br /> def __init__(self, name, value, weight):<br /> self.name = name;<br /> self.value = float(value);<br /> self.weight = float(weight);<br /><br /> def getName(self):<br /> return self.name<br /><br /> def getValue(self):<br /> return self.value<br /><br /> def getWeight(self):<br /> return self.weight<br /><br /> def __str__(self):<br /> return str(self.name)+", $"+str(self.value)+", "+str(self.weight)+"Kg"</code><br /><br />We then instantiate the items based on the above table:<br /><br /><code>def instantiateItems():<br /> names=["Desktop","Laptop","LED TV","Smartphone","Ring","Piano","HiFi Set","PSP"]<br /> values=[1500,1200,1000,750,1000,6000,500,300]<br /> weights=[10,1.5,10,0.25,0.01,180,5,0.5]<br /> items=[]<br /> for i in range(len(names)):<br /> items.append(Item(names[i],values[i],weights[i]))<br /> return items</code><br /><br />At this point we should have a list of items ready for us to work with. We then come up with the greedy algorithm that does all these. The Greedy Algorithm should accept a list of items, the maximum weight that a person can carry, and the function used for comparison.<br /><br />Here is the Greedy Algorithm:<br /><br /><code>def greedy(items, maxWeight, keyFunction):<br /> itemsSorted = sorted(items, key=keyFunction, reverse=True)<br /> knapsack=[]<br /> totalValue=0.0<br /> totalWeight=0.0<br /> for item in itemsSorted:<br /> if totalWeight+item.getWeight()<maxWeight:<br /> knapsack.append(item)<br /> totalValue+=item.getValue()<br /> totalWeight+=item.getWeight()<br /> return knapsack, totalValue, totalWeight<maxweight: br=""><br /><br />def value(item):<br /> return item.getValue()<br /><br />def weight(item):<br /> return item.getWeight()<br /><br />def vwRatio(item):<br /> return item.getValue()/item.getWeight()</maxweight:></code><br /><br />A little explanation here about the formal parameter keyFunction. The Key Function is a function of 1 formal parameter that the sorted() function would use to compare the items in the list. value(), weight() and vwRatio() are examples of Key Functions. Using value() as the keyFunction, for example, would compare whatever is returned by the item.getValue() of all the items.<br /><br />Let's test the Greedy Algorithm now:<br /><br /><code>def printResults(knapsack,totalValue,totalWeight):<br /> for item in knapsack:<br /> print(item)<br /> print("Total Value: $"+str(round(totalValue,2)))<br /> print("Total Weight: "+str(round(totalWeight,2))+"Kg")<br /><br />def testGreedy():<br /> items=instantiateItems()<br /> knapsack,totalValue,totalWeight=greedy(items,22.5,value)<br /> print("Using value as the key function:")<br /> printResults(knapsack,totalValue,totalWeight)<br /> knapsack,totalValue,totalWeight=greedy(items,22.5,weight)<br /> print()<br /> print("Using weight as the key function:")<br /> printResults(knapsack,totalValue,totalWeight)<br /> knapsack,totalValue,totalWeight=greedy(items,22.5,vwRatio)<br /> print()<br /> print("Using vwRatio as the key function:")<br /> printResults(knapsack,totalValue,totalWeight)</code><br /><br />Assuming our burglar has a knapsack that can carry only up to 22.5Kg of things, here's the results:<br /><br /><code>Using value as the key function:<br />Desktop, $1500.0, 10.0Kg<br />Laptop, $1200.0, 1.5Kg<br />LED TV, $1000.0, 10.0Kg<br />Ring, $1000.0, 0.01Kg<br />Smartphone, $750.0, 0.25Kg<br />PSP, $300.0, 0.5Kg<br />Total Value: $5750.0<br />Total Weight: 22.26Kg<br /><br />Using weight as the key function:<br />Ring, $1000.0, 0.01Kg<br />Smartphone, $750.0, 0.25Kg<br />PSP, $300.0, 0.5Kg<br />Laptop, $1200.0, 1.5Kg<br />HiFi Set, $500.0, 5.0Kg<br />Desktop, $1500.0, 10.0Kg<br />Total Value: $5250.0<br />Total Weight: 17.26Kg<br /><br />Using vwRatio as the key function:<br />Ring, $1000.0, 0.01Kg<br />Smartphone, $750.0, 0.25Kg<br />Laptop, $1200.0, 1.5Kg<br />PSP, $300.0, 0.5Kg<br />Desktop, $1500.0, 10.0Kg<br />LED TV, $1000.0, 10.0Kg<br />Total Value: $5750.0<br />Total Weight: 22.26Kg</code><br /><br />As you can see, there is no single best Key Function that we can use. In the end it's all up to the situation and we must look at every single one to know what is the best. Since it's called Greedy Algorithm, there is definitely some sort of Order of Complexity associated with it.<br /><br />The implemented algorithm is divided into two parts: Sorting, then Comparing.<br /><br />We can assume that the sorting is some variant of the Merge Sort, of which we have:<br />O(len(items)*log(len(items))) or otherwise O(n log n)<br /><br />The comparison part is just a for loop that runs a maximum of len(items) times. Therefore we have:<br />O(len(items)) or otherwise O(n)<br /><br />Taking the worst case, the Order of Complexity of the algorithm is O(n log n).<br /><br />However, it is important to note that what we have may not be the absolute optimal we can have.<br /><br />If we formulate a problem well enough, it would definitely be apparent that it can be solved in a bruteforce way. The biggest trouble is whether it is feasible or not. Fortunately, bruteforce is feasible for small 0/1 Knapsack Problems.<br /><br />Suppose that we have these 8 items. The maximum combination of items that we can take is:<br />2<sup>8</sup> = 256<br /><br />We can then run a bruteforce algorithm to go through every single possible combination to find out what is the maximum [Sum of Value] where [Sum of Weight] is below maximum weight. In other words:<br />Objective Function - Maximum [Sum of Value]<br />Constraint - [Sum of Weight] Below or Equal to Maximum Weight<br /><br />The Order of Complexity for such an algorithm is:<br />O(len(items)**2) or O(n<sup>2</sup>)<br /><br />Of course, if we're doing just 8 items, we have 256 different combinations. However, what if we have a list of 50 items. That would be 1125899906842624 item combinations. Even if the computer could generate a combination in a nanosecond, it would take 36 years to complete.<br /><br />Just for kicks, here's the bruteforce algorithm to solve this:<br /><br /><code>def brange(integer):<br /> binary=bin(integer)[2:]<br /> brange=[]<br /> for i in range(len(binary)):<br /> if binary[len(binary)-i-1]=='1':<br /> brange.append(i)<br /> return brange<br /> <br />def bruteForceKnapsack(items,maxWeight):<br /> combinations=[]<br /> values=[]<br /> weights=[]<br /> maxValue=0.0<br /> for combination in range(2**len(items)):<br /> currentWeight=0.0<br /> currentValue=0.0<br /> for i in brange(combination):<br /> currentWeight+=items[i].getWeight()<br /> currentValue+=items[i].getValue()<br /> if currentWeight<=maxWeight:<br /> if maxValue<currentValue:<br /> combinations=[]<br /> values=[]<br /> weights=[]<br /> maxValue=currentValue <br /> combinations.append(combination)<br /> values.append(currentValue)<br /> weights.append(currentWeight)<br /> elif maxValue==currentValue:<br /> combinations.append(combination)<br /> values.append(currentValue)<br /> weights.append(currentWeight)<br /> return combinations,values,weights<br /><br />def testBruteForceKnapsack():<br /> items=instantiateItems()<br /> combinations,values,weights=bruteForceKnapsack(items,22.5)<br /> print("For a maxweight of 22.5:")<br /> for i in range(len(combinations)):<br /> for j in brange(combinations[i]):<br /> print(items[j])<br /> print("Total Value: $"+str(round(values[i],2)))<br /> print("Total Weight: "+str(round(weights[i],2))+"Kg")<br /> print()<currentvalue: br=""><currentvalue: br=""></currentvalue:></currentvalue:></code><br /><br />The brange() function returns a list of 1's in an integer converted to binary. For example, 10 when converted to binary is 1010. It would return 1 and 3 which are the location of the 1's. This algorithm goes through all possible combinations. If there's a better combination than one it currently knows, it would replace all previous combinations. If there's a combination that gives the same value as the one it currently has, it would append it to the list. Here's some output:<br /><br /><code>For a maxweight of 22.5:<br />Desktop, $1500.0, 10.0Kg<br />Laptop, $1200.0, 1.5Kg<br />LED TV, $1000.0, 10.0Kg<br />Smartphone, $750.0, 0.25Kg<br />Ring, $1000.0, 0.01Kg<br />PSP, $300.0, 0.5Kg<br />Total Value: $5750.0<br />Total Weight: 22.26Kg</code><br /><br />Here we see that the Greedy Algorithm got lucky. Its Best Effort results were correct! If we work with a different set of items the Greedy Algorithm results may not be as optimal, but the Brute Force it will definitely give you the best, or a set of best results.<br /><br />Brute Force, of course, is unscalable. In the next article we will look at other ways we can find such results.<br /><br />Here's graphically how much we can steal from the house:<br /><br /><img src="http://img577.imageshack.us/img577/9185/jsh.png" /><br /><br />As you can see, optimally 28Kg is minimum to steal mostly everything. You'll have to add more than 150Kg to your maximum load in order to start steal more.Kelvinhttp://www.blogger.com/profile/08530586088329644767noreply@blogger.com0tag:blogger.com,1999:blog-6890826354654530731.post-45937812838486340922013-06-18T16:14:00.000+08:002013-06-18T23:12:06.165+08:00Computer Science 14Before believing the result of any simulation, we have to have confidence that our conceptual model is correct, and that we have correctly implemented the model.<br /><br />Before, we looked at Statistical Tests. These tests give us Statistical conclusions which tell us things about the simulation, about how it converges to a certain result. However, we still have to perform <b>Sanity Checks</b>.<br /><br />One of the ways we can do this is to test the results against reality. For example, in the previous article, the value of Pi obtained from the simulation can be tested with the formula to know if we got the right answer or not.<br /><br />Let us look at an interplay between <b>Physical Reality</b>, <b>Theoretical Models</b> and <b>Computational Models</b>. The Physical Reality is based on the Physical Situation. An example of a Physical Situation can be the price of Big Mac across countries (which we know as the <a href="http://en.wikipedia.org/wiki/Big_Mac_Index">Big Mac Index</a>), or a stock market, or viruses in a body. We use a Theoretical Model to have some deeper insights into it, and when it gets too complicated or is unable to show us everything, we use a Computational Model.<br /><br />At times when we conduct an actual experiment, we may not get the results that match exactly with the theory. To work with experimental errors, we have to assume that there is some sort of random perturbation applied to the results. Gaussian's work suggests that we can model experimental errors as Normally Distributed.<br /><br />The <a href="https://en.wikipedia.org/wiki/Hooke%27s_law">Hooke's Law</a>, <b>F=kx</b>, was described by Robert Hooke in 1678. In plain English, the Hooke's Law states that the Force (F) stored in a spring is linearly related to the Distance (x) the spring has either been compressed or stretched multiplied by the Stiffness (k) of the spring. A stiff spring has a large k value. This law holds for a wide variety of materials and systems. However, of course, it does not hold for a force that causes the spring to exceed its elastic limit.<br /><br />We can find the k value of a spring by suspending the spring and hanging known masses on it. Measuring the distance for each mass. Suppose that a 1K<br />Using the rules: <br />F = kx<br />F = ma<br />ma = kx<br />k = ma/x<br /><br />We derive:<br />k<br />= 1Kg * 9.81m/s<sup>2</sup> / 0.2m <br />= 49.05N/m<br /><br />We can do a sanity check to make sure this is correct (using 0.4m as the distance, we should evaluate to 2Kg):<br />49.05 = m * 9.81m/s<sup>2</sup> / 0.4m<br />49.05 / (9.81m/s<sup>2</sup>/0.4) = 2Kg<br /><br />What we are assuming here is that 49.05N/m is the correct k value with just one experiment. However, remember that experimental errors are normally distributed. If we actually put a 2Kg weight there, we may or may not get a stretching of exactly 0.4m. People would typically repeat the same experiment many times with different weights in order to get enough data to compute an accurate value of k.<br /><br />Suppose that we have the following data recorded by a fellow scientist:<br /><br /><code>0.0 0.0<br />0.05 0.00968<br />0.1 0.02087<br />0.15 0.02871<br />0.2 0.03849<br />0.25 0.05371<br />0.3 0.05917<br />0.35 0.07066<br />0.4 0.08041<br />0.45 0.08863<br />0.5 0.10219<br />0.55 0.10251<br />0.6 0.09903<br />0.65 0.10078<br />0.7 0.10321<br />0.75 0.09605<br />0.8 0.09730<br />0.85 0.09986<br />0.9 0.09876<br />0.95 0.09825<br />1.0 0.09986</code><br /><br />The data comes in a file called spring_data.txt describing weights (in kg) and stretch (in m). In order to manipulate this, we can either manually copy the data into our program, or import and parse it from within Python. In this example, we are going to import and parse the data:<br /><br />The following code imports the data, and then returns two lists:<br /><br /><code>def importData(fileName):<br /> masses=[]<br /> distances=[]<br /> file=open(fileName,'r')<br /> for line in file:<br /> m,d=line.split()<br /> masses.append(float(m))<br /> distances.append(float(d))<br /> file.close()<br /> return masses,distances<br /><br />masses,distances=importData("spring_data.txt")<br />print(masses)<br />print(distances)</code><br /><br />Now that we have the two lists, we can then plot the data using:<br /><br /><code>import pylab<br /><br />pylab.plot(masses,distances,"o")<br />pylab.xlabel("Mass (kg)")<br />pylab.ylabel("Distance (m)")<br />pylab.title("Spring Experiment")<br />pylab.show()</code><br /><br />As you can see, the distances are somewhat linear at first (with some level of normally distributed error), then it evens off at the end:<br /><br /><img src="http://img29.imageshack.us/img29/2574/t7a.png" /><br /><br />In Numpy, there is a type called the "<b>array</b>". This array has nothing to do with the array we have in Java. The array here can be considered as more of a matrix. Changing data to an array allows:<br />1) Mathematical operations to be performed on it (i.e. array*3 will multiply every element by 3)<br />2) Cross-products between two arrays (array1*array2)<br /><br />Lets look at some application of arrays:<br /><br /><code>import pylab<br /><br />forces = pylab.array(masses)*9.81<br /><br />pylab.plot(forces,distances,"o")<br />pylab.xlabel("Force (N)")<br />pylab.ylabel("Distance (m)")<br />pylab.title("Spring Experiment")<br />pylab.show()</code><br /><br />In this example, the masses list is changed into an array and then multiplied by 9.81. What we get is a list of Force (N):<br /><br /><img src="http://img845.imageshack.us/img845/6227/obh.png" /><br /><br />From here, we can calculate k. But before that,we must know whether our data is sensible. In our Theoretical Model, the data should fall on a line F=kx. If we can draw a good line here, we can find k very easily.<br /><br />If we only have two points, we can easily join them to create a line. If we have a bunch of points scattered around, we must find the line that is the closest to all the points. This is known as <b>Fitting</b>. In order to find the line, we need to have a measure to how good is the fit. We need an <b>Objective Function</b> to compare fits. The standard Objective Function is the <b>Least Squares Fit</b>.<br /><br />Suppose we have two lists, a list of predictions according to the line, and a list of observations. The Least Squares Fit is described as below:<br /><br /><code>def leastSquaresFit(observed,predicted):<br /> sum=0.0<br /> for i in range(len(observed)):<br /> sum+=(predicted[i]-observed[i])**2<br /> return sum</code><br /><br />For comparing two Line of Best Fits (or Polynomial of Best Fits), the lower the number, the better.<br /><br />A good model is created when, for each independent variable Mass (kg), we can predict a good dependent variable Displacement (m).<br /><br />For creation of the Line of Best fit of any graph, we use <b>Linear Regression</b>. Pylab does Linear Regression using the function polyfit(observedX,observedY,degreeOfPolynomial). Recall that a line is y=mx+c. Polyfit would return the values of m and c for 1 degree fitting:<br /><br /><code>import pylab<br /><br />forcesArray = pylab.array(masses)*9.81<br />distancesArray = pylab.array(distances)<br /><br />m,c = pylab.polyfit(forcesArray,distancesArray,1)<br />distancesFitArray = m*forcesArray+c<br /><br />pylab.plot(forcesArray,distancesArray,"o")<br />pylab.plot(forcesArray,distancesFitArray)<br />pylab.xlabel("Force (N)")<br />pylab.ylabel("Distance (m)")<br />pylab.title("Spring Experiment, k="+str(1/m))<br />pylab.show()</code><br /><br />We have the following output:<br /><br /><img src="http://img607.imageshack.us/img607/1750/bw99.png" /><br /><br />We can compute for k through this reasoning:<br />y=mx+c<br />Distance = m * Force + c<br />Force = 1/m * Distance + c<br />k = 1/m<br /><br />So is the k value correctly shown? If we look at the picture, we'll realize that the line is not a very good fit. Is it better to use a cubic fit to model our data? Let's look at the example below:<br /><br /><code>import pylab<br /><br />forcesArray = pylab.array(masses)*9.81<br />distancesArray = pylab.array(distances)<br /><br />m,c = pylab.polyfit(forcesArray,distancesArray,1)<br />distancesFitArray = m*forcesArray+c<br />a,b,c,d = pylab.polyfit(forcesArray,distancesArray,3)<br />distancesFitCubicArray = a*forcesArray**3 + b*forcesArray**2 + c*forcesArray**1 + d<br /><br />pylab.plot(forcesArray,distancesArray,"o")<br />pylab.plot(forcesArray,distancesFitArray)<br />pylab.plot(forcesArray,distancesFitCubicArray)<br />pylab.legend(["Observed","Linear Fit","Cubic Fit"])<br />pylab.xlabel("Force (N)")<br />pylab.ylabel("Distance (m)")<br />pylab.title("Spring Experiment, k="+str(1/m))<br />pylab.show()</code><br /><br />It seems as though the Cubic Fit is much better than the Linear Fit in describing the spring?<br /><br /><img src="http://img542.imageshack.us/img542/6293/cad2.png" /> <br /><br />The useful thing about a model is its ability to predict data. Let's look at how the Cubic Fit would predict our data. We want to see, for example, what would happen if an approximately 20N force is placed on the spring. We do this by adding another point in the Forces array:<br /><br /><code>import pylab<br /><br />forcesArray = pylab.array(masses)*9.81<br />forcesArrayExtended = pylab.array(masses+[2])*9.81<br />distancesArray = pylab.array(distances)<br /><br />m,c = pylab.polyfit(forcesArray,distancesArray,1)<br />distancesFitArray = m*forcesArray+c<br />a,b,c,d = pylab.polyfit(forcesArray,distancesArray,3)<br />distancesFitCubicArray = a*forcesArrayExtended**3 + b*forcesArrayExtended**2 + c*forcesArrayExtended**1 + d<br /><br />pylab.plot(forcesArray,distancesArray,"o")<br />pylab.plot(forcesArray,distancesFitArray)<br />pylab.plot(forcesArrayExtended,distancesFitCubicArray)<br />pylab.legend(["Observed","Linear Fit","Cubic Fit"])<br />pylab.xlabel("Force (N)")<br />pylab.ylabel("Distance (m)")<br />pylab.title("Spring Experiment, k="+str(1/m))<br />pylab.show()</code><br /><br />Now observe the output:<br /><br /><img src="http://img822.imageshack.us/img822/4562/ckq2.png" /><br /><br />The Cubic Fit seems to suggest that if we place 20N of force on the spring, the spring would retract back higher than it originally was. That happens maybe in another universe with different physics but that doesn't happen on Earth.<br /><br />Remember that Hooke's Law is a first order linear approximation that will fail as the spring approaches its limit of elasticity. Just because the data is not plotted into a straight line does not mean that we should attempt to fit an arbitrary curve onto it, because any polynomial with a high enough degree can fit into any data. What we want, instead, is to look at the portions where Hooke's Law remains valid - that is, when the spring hasn't hit its limit of elasticity.<br /><br />Let us repeat the experiment, discarding the last 10 results:<br /><br /><code>import pylab<br /><br />forcesArray = pylab.array(masses)*9.81<br />forcesArrayLimited = pylab.array(masses[:-10])*9.81<br />#forcesArrayExtended = pylab.array(masses+[2])*9.81<br />distancesArray = pylab.array(distances)<br />distancesArrayLimited = pylab.array(distances[:-10])<br /><br />m,c = pylab.polyfit(forcesArrayLimited,distancesArrayLimited,1)<br />distancesFitArray = m*forcesArray+c<br />#a,b,c,d = pylab.polyfit(forcesArray,distancesArray,3)<br />#distancesFitCubicArray = a*forcesArrayExtended**3 + b*forcesArrayExtended**2 + c*forcesArrayExtended**1 + d<br /><br />pylab.plot(forcesArray,distancesArray,"o")<br />pylab.plot(forcesArray,distancesFitArray)<br />#pylab.plot(forcesArray,distancesFitCubicArray)<br />pylab.legend(["Observed","Linear Fit","Cubic Fit"])<br />pylab.xlabel("Force (N)")<br />pylab.ylabel("Distance (m)")<br />pylab.title("Spring Experiment, k="+str(1/m))<br />pylab.show()</code><br /><br />Now notice that we've obtained a much more believable fit.<br /><br /><img src="http://img33.imageshack.us/img33/6682/4z.png" /><br /><br />Now notice that we've obtained a better approximation for the k value of the spring, which is originally rated at 49.05 by the manufacturer.<br /><br />To know how good a fit is to a particular set of results, we use the <b>Coefficient of Determination</b>. The Coefficient of Determination is:<br />r<sup>2</sup>=1-(Estimated Error / Variance of Measured Data).<br /><br />Computationally, it is expressed as such:<br /><br /><code>def rSquare(predicted,measured):<br /> estimatedError = ((predicted-measured)**2).sum()<br /> measuredMean = measured.sum()/float(len(measured))<br /> measuredVariance = (measuredMean-measured**2).sum()<br /> return 1-estimatedError/measuredVariance</code><br /><br />Thanks to Numpy's array, we performed array operations in one line each. Do you see how useful arrays are!?<br /><br />We can compare the goodness of each fit by comparing it to the measured data. The closer it is to 1, the better. A CoD of 1 means that the fit maps directly onto the measured data.Kelvinhttp://www.blogger.com/profile/08530586088329644767noreply@blogger.com0tag:blogger.com,1999:blog-6890826354654530731.post-20614119741455401202013-06-16T00:22:00.002+08:002013-06-16T03:57:20.651+08:00Computer Science 13How do we construct computational models that will help us understand the real world?<br /><br />A <b>Gaussian Distribution</b>, or commonly known as Normal Distribution or the Bell Curve, can be described with just its Mean and Standard Deviation. <br /><br />Whenever possible, we would model distributions as Gaussian because they are extremely easy to deal with. We have things like the <b>Empirical Rule</b> to help us draw conclusions. However, if a certain distribution is not Gaussian and we treat it that way, the results could be extremely misleading.<br /><br />Take for example the results of rolling a die. The outcome of die-rolling is not normally distributed. In the case of a fair-die, each face is equally likely to appear. Plotting the results on a histogram would not yield a Bell Curve.<br /><br />In a fair lottery system, the probability of anything coming up is the same. Most people would study past results in order to predict what comes up next. However, illustrating the <b>Gambler's Fallacy</b> again, lottery has no memory and is thus not based on past results.<br /><br />In case of the Lottery and Die Rolling, we have a <b>Uniform Distribution</b>. Uniform Distributions describe distributions where each result is equally plausible. The only thing required to describe a Uniform Distribution is the Range of the results.<br /><br />The Uniform Distribution does not occur much in nature. It is typically observed in fair chance games deviced by humans. Uniform Distribution is not as useful in modelling complex systems as the Gaussian Distribution.<br /><br />Another type of distribution that occurs quite frequently is the <b>Exponential Distribution</b>. Exponential Distribution is the only Continuous Distribution that is memory-less. The concentration of drug in the bloodstream can be modelled with Exponential Distribution.<br /><br />Assume that each time-step, each molecule has a probability p of being cleared by the body. The system is memory-less, in the sense that the probability of each molecule being cleared is not dependent on what happened to the previous molecules in the previous steps.<br /><br />If the probability of being cleared at each step is p, then at time T=1, the probability of the molecule still being in the body is 1-p. Since it is memory-less, the probability of it being cleared is still 1-p. Therefore, the probability of it still being in the body is (1-p)<sup>2</sup>.<br /><br />We can see that the probability of the molecule still being in the body at the particular time is (1-p)<sup>t</sup><br /><br />Suppose there are 1000 molecules, and at each time-step, each molecule has a chance of 0.01 to be cleared by the body. We would end up with the following graphs:<br /><br /><img src="http://img41.imageshack.us/img41/8752/aho.png" /><br /><br />Of course, what we are seeing is the mathematical graphing of what we would expect. This graph is achieved through the following code:<br /><br /><code>import pylab<br /><br />def mathematicalMolecules(n, clearProbability, steps):<br /> moleculesRemaining=[n]<br /> for i in range(steps):<br /> moleculesRemaining.append(n*((1-clearProbability)**i))<br /> return moleculesRemaining<br /><br />moleculeList = mathematicalMolecules(1000,0.01,1000)<br />pylab.plot(moleculeList)<br />pylab.xlabel("Time (T)")<br />pylab.ylabel("Molecules")<br />pylab.title("Exponential Decay of Molecules")<br />pylab.show()</code><br /><br />Now, notice that the rate of clearing of molecules decrease in a logarithmic way. The less molecules there are, the less molecules are going to be lost. If we plot the graph with semilogy, we can see that it is indeed logarithmic:<br /><br /><img src="http://img850.imageshack.us/img850/3094/19y.png" /><br /><br />We can actually create a Monty Carlo simulation to mimick the process described, instead of simply using mathematical formulas to do it. We make use of the following code to implement this:<br /><br /><code>import random<br /><br />def simulationMolecules(n, clearProbability, steps):<br /> moleculesRemaining=[n]<br /> for i in range(steps):<br /> for j in range(n):<br /> if random.random()<clearProbability:<br /> n-=1<br /> moleculesRemaining.append(n)<br /> return moleculesRemaining<br /><br />moleculeList = simulationMolecules(1000,0.01,1000)<br />pylab.plot(moleculeList)<br />pylab.xlabel("Time (T)")<br />pylab.ylabel("Molecules")<br />pylab.title("Exponential Decay of Molecules (Simulation)")<br />pylab.show()</code><br /><br />Notice that the output is not as smooth as we want it to be, but it matches the mathematical diagram almost exactly:<br /><br /><img src="http://img203.imageshack.us/img203/9933/p3sb.png" /><br /><br />If we look at this graph in a logarithmic way, this is what we'll get:<br /><br /><img src="http://img845.imageshack.us/img845/2971/3h6.png" /><br /><br />The biggest difference is that our simulation did not allow for fractions of a molecule, which is more true to what we are looking for.<br /><br />Here's a direct mapping of the two graphs:<br /><br /><img src="http://img543.imageshack.us/img543/8819/z2j.png" /><br /><br /><img src="http://imageshack.us/a/img29/204/5d47.png" /><br /><br />The Mathematical Model is also known as the <b>Analytic Model</b>. Here, we've illustrated the difference between Analytical and Simulation for Exponential Decay. There is no right or wrong answer over here. When looking at the models, we look at its <b>Fidelity</b> and <b>Credibility</b>. Fidelity refers to whether the outcome is accurate. Credibility is whether the outcome can be believed.<br /><br />Then there is Utility. Which model can be applied to what kind of questions? There is an additional Utility offered by the Simulation Model, because we can easily change the model to be slightly different in ways that are usually harder for an Analytic Model.<br /><br />For example, if we wish to calculate the rate of which the body clears bacteria, but also factor in things like the rate at which the bacteria can regenerate themselves, it is easier to model with a Simulation rather than an Analytic Model.<br /><br />Let us look at the <b>Monty Hall</b> problem. In the Monty Hall problem, we start with 3 doors. Behind one of the doors, there is a great prize. Behind each of the other doors, there is a booby prize. A person is asked to choose one of the doors. The host then reveals one of the other two doors with the booby prize. There is then a question posed to the person: Do you want to switch?<br /><br />The question is: Which is better? Does it matter whether we switch?<br /><br />Let us look at this first in an Analytical perspective:<br />The chance that the prize lies in your door is 1/3<br />The chance that the prize lies in one of the other 2 doors is 2/3<br /><br />After choosing the first door, one of the two other doors is eliminated. Now, what you have is:<br />The chance that the prize lies in your door is 1/3<br />The chance that the prize lies in the other (2 - 1) door is 2/3<br /><br />Therefore, making the switch doubles your chance from 1/3 to 2/3.<br /><br />The following maps the problem out in a computational way:<br /><br /><code>import random<br />import pylab<br /><br />def montyHall(n):<br /> won=0<br /> for i in range(n):<br /> correctDoor = random.randint(1,3)<br /> chosenDoor = random.randint(1,3)<br /> if correctDoor==chosenDoor:<br /> won+=1<br /> return won<br /><br />def montyHallSwitch(n):<br /> won=0<br /> for i in range(n):<br /> correctDoor = random.randint(1,3)<br /> chosenDoor = random.randint(1,3)<br /> if correctDoor!=chosenDoor:<br /> won+=1<br /> return won<br /><br />def plotMontyHall(trialsExp):<br /> participants=[]<br /> montyHallList=[]<br /> for i in range(trialsExp):<br /> montyHallList.append(montyHall(2**i))<br /> participants.append(2**i)<br /> montyHallSwitchList=[]<br /> for i in range(trialsExp):<br /> montyHallSwitchList.append(montyHallSwitch(2**i))<br /> pylab.plot(participants,montyHallList)<br /> pylab.plot(participants,montyHallSwitchList)<br /> pylab.xlabel("Participants")<br /> pylab.ylabel("Winners")<br /> pylab.title("Monty Hall: Stand vs Switch")<br /> pylab.legend(["Stand","Switch"])<br /> pylab.show()<br /><br />plotMontyHall(15)</code><br /><br />This is based upon the following:<br />1) If guessed door is prize door, stand will give a win<br />2) If guessed door is NOT prize door, switch will give a win<br /><br />This is the resulting graph. As we can see, the winning fraction is exactly twice for Switch. Also, 1/3 of Stand participants and 2/3 of Switch participants are winners.:<br /><br /><img src="http://img11.imageshack.us/img11/5265/w9gz.png" /><br /><br />We can use randomized algorithms to solve problems in which randomness plays no roles. This is surprising, but it is an extremely important tool.<br /><br />Let us consider Pi (<span class="texhtml">π)</span>. Pi is a constant that people have known since the 18th century. The earliest estimate of Pi was 4*(8/9)<sup>2</sup>=3.16 by the Egyptians in 1650 B.C. In 650 B.C., when Solomon was creating his Basin, he made use of Pi, which was roughly 3. The best estimate of Pi in ancient time was by Achimedes. He didn't give the value of Pi. Instead, he built a polygon of 96 sides, in which he concluded Pi to be between 223/71 to 22/7.<br /><br />Later on, Buffon and Laplace came up with a <b>Stochastic Simulation</b> to solve for Pi. This is the Buffon and Laplace solution to finding Pi:<br /><br /><img src="http://img46.imageshack.us/img46/7422/3q6.png" /><br /><br />Suppose that we have a circle of radius 1 inside a square of sides 2.<br /><br />We know for sure that the area of the circle is <span class="texhtml">πr<sup>2</sup>, and the area of the square is r</span><span class="texhtml"><sup>2</sup></span><br /><br /><span class="texhtml">Buffon and Laplace proposed to randomly drop needles into such a set-up on the ground, then use the number of needles in the circle to estimate the area of the circle and the total number of needles to estimate the area of the square.</span><br /><span class="texhtml"><br /></span><span class="texhtml">We can then use the following to obtain </span><span class="texhtml">π:</span><br /><span class="texhtml">Area of Circle / Area of Square</span><br /><span class="texhtml">= </span><span class="texhtml">πr<sup>2</sup>/(2r)<sup>2</sup></span><br /><span class="texhtml">= </span><span class="texhtml">πr<sup>2</sup>/4r<sup>2</sup> </span><span class="texhtml"></span><br /><span class="texhtml"><span class="texhtml">= </span></span><span class="texhtml"><span class="texhtml"><span class="texhtml">π/4</span></span></span><br /><br /><span class="texhtml"><span class="texhtml"><span class="texhtml"><span class="texhtml">π = 4 (Area of Circle / Area of Square)</span> </span></span></span><br /><br /><span class="texhtml"><span class="texhtml"><span class="texhtml">Therefore:</span></span></span><br /><br /><span class="texhtml">π = 4 (Needles in Circle / Needles in Square)</span><br /><span class="texhtml"><br /></span><span class="texhtml">Because Buffon and Laplace didn't have the proper physical setup to conduct such a test, we can help him through the use of a simulation:</span><br /><span class="texhtml"><br /></span><span class="texhtml"><code>import math<br />import random<br /><br />def estimatePi(n):<br /> circle=0<br /> for i in range(n):<br /> x = random.random()<br /> y = random.random()<br /> if math.hypot(x-0.5,y-0.5)<0.5:<br /> circle+=1<br /> return 4*circle/float(n)<br /><br />print (estimatePi(50000000))</code></span><br /><span class="texhtml"><br /></span><span class="texhtml">In my example, I used a circle of radius 0.5 instead. The formula is the same. We drop 50 million pins into the setup, counted the pins in the circle, and used the formula to give us the answer. The pins were just a means of estimating area.</span><br /><br /><span class="texhtml"><img src="http://imageshack.us/a/img600/1218/1j8s.png" /></span><br /><br /><span class="texhtml"><img src="http://imageshack.us/a/img819/3738/wn9.png" /></span><br /><br /><span class="texhtml">As you can see, as the amount of pins used increased, the results were closer to Pi and its variation decreased. Here is the code that generated these graphs:</span><br /><span class="texhtml"><br /></span><span class="texhtml"><code>import pylab<br /><br />def standardDeviation(X):<br /> mean=sum(X)/float(len(X))<br /> total=0.0<br /> for x in X:<br /> total+=(x-mean)**2<br /> return (total/len(X))**0.5<br /><br />def mean(X):<br /> return sum(X)/float(len(X))<br /><br />def plotEstimatePi(pinsExp, trials):<br /> pis=[]<br /> pins=[]<br /> stdDev=[]<br /> for i in range(pinsExp):<br /> pins.append(trials*2**i)<br /> currentPis=[]<br /> for j in range(trials):<br /> currentPis.append(estimatePi(2**i))<br /> pis.append(mean(currentPis))<br /> stdDev.append(standardDeviation(currentPis))<br /> pylab.plot(pins,pis,"o")<br /> pylab.xlabel("Pins Used")<br /> pylab.ylabel("Estimated Pi")<br /> pylab.title("Pi Estimation, Final Value = "+str(pis[len(pis)-1]))<br /> pylab.semilogx()<br /> pylab.figure()<br /> pylab.plot(pins,stdDev)<br /> pylab.xlabel("Pins Used")<br /> pylab.ylabel("Standard Deviation")<br /> pylab.title("Std Dev, Final Value = "+str(stdDev[len(stdDev)-1]))<br /> pylab.semilogx()<br /> pylab.show()<br /> <br />plotEstimatePi(15,100)</code></span><br /><br />Note that we can actually come up with more accurate representations of Pi through other means, like binary search. However, this example illustrates the usefulness of randomness in solving non-random problems. This method is extremely useful when solving for integration, because we know that the integration of a function is the area under the graph.Kelvinhttp://www.blogger.com/profile/08530586088329644767noreply@blogger.com3tag:blogger.com,1999:blog-6890826354654530731.post-75241879994202679192013-06-15T01:02:00.000+08:002013-06-15T11:36:45.517+08:00Computer Science 12How can we know when it's safe to assume that the average results of a finite number of trials is representative of what we would get if we did the test many more times.<br /><br />The question we want to ask is: How many samples do we need, to finally believe the answer? How many trials do we have to conduct to have confidence in the result?<br /><br />At the root of all of it, is the notion of <b>Variance</b>. Variance is a measure of how much spread is there in the possible outcomes, or rather, how much do the outcomes vary?<br /><br />It is more desirable to run multiple trials compared to a single, large trial. It is because running multiple trials allow us to have a measure of Variance. It is through this measure that we can have a sense of how trustworthy is the result.<br /><br />The Variance is formalized through the concept of <b>Standard Deviation</b>. Informally, the Standard Deviation measures the fraction of values which are close to the Mean. If many values are close to the Mean, the SD is small. If many values are far from the Mean, the SD is large. If all values are the same, then SD is 0.<br /><br />Here is Standard Deviation expressed in computational form:<br /><br /><code>def standardDeviation(X):<br /> mean=sum(X)/float(len(X))<br /> total=0.0<br /> for x in X:<br /> total+=(x-mean)**2<br /> return (total/len(X))**0.5</code><br /><br />Now, we can use the SD to look at the relationship between the number of samples and the amount of confidence we have in the results. We can plot the Standard Deviations of our trials to demonstrate / prove that the accuracy of the result is increasing with the increase of the number of trials.<br /><br />Let us look back again at the previous coin flipping example. We'll make use of the following two functions:<br /><br /><code>def flipCoin():<br /> return random.randint(0,1)<br /><br />def flipCoinList(n):<br /> x=[]<br /> for i in range(n):<br /> x.append(flipCoin())<br /> return x<br /><br />def mean(X):<br /> return sum(X)/float(len(X))</code><br /><br />flipCoin() returns 0 or 1, and flipCoinAverage(n) returns the list of n flips, and mean() tells us the mean of a list.<br /><br />We can observe Variance using this example:<br /><br /><code>def flipCoinTrial(flipsExp,trials):<br /> flips=[]<br /> means=[]<br /> stdDev=[]<br /> for i in range(flipsExp):<br /> currentFlips=2**i<br /> currentMeans=[]<br /> for j in range(trials):<br /> currentList=flipCoinList(currentFlips)<br /> currentMeans.append(mean(currentList))<br /> flips.append(currentFlips)<br /> means.append(currentMeans)<br /> stdDev.append(standardDeviation(currentMeans))<br /> pylab.plot(flips,means)<br /> pylab.xlabel("Number of Flips")<br /> pylab.ylabel("Obtained Mean")<br /> pylab.title("Mean of Flips")<br /> pylab.semilogx()<br /> pylab.ylim(0,1)<br /> pylab.figure()<br /> pylab.plot(flips,stdDev)<br /> pylab.xlabel(str(trials)+" Trials of Flips")<br /> pylab.ylabel("Standard Deviation")<br /> pylab.title("Standard Deviation of Trials")<br /> pylab.semilogx()<br /> pylab.ylim(0,1)<br /> pylab.show()</code><br /><br />This flips coins a number of times, and performs it in sets of trials. The SD is then measured for each set, before a new set of trials begin with more flips of the coins. Here's the result of 2<sub>0 to 20</sub> flips, each done 10 times:<br /><br /><img src="http://imageshack.us/a/img854/2687/90h.png" /><br /><img src="http://imageshack.us/a/img838/1141/7cf.png" /><br /><br />Beautiful, isn't it? As you can see, as the number of flips increases, the Variance in the sets of output decreases.<br /><br />The size of the Standard Deviation compared to the Mean shows us how reliable our results are. The smaller the Standard Deviation, the better. The <b>Coefficient of Variation</b> allows us to represent this ratio. It is simply:<br /><span class="st">σ(x) / (1/n</span><span class="st"><span class="st"> . ∑<sub>(i=1,n)</sub>x<sub>i</sub> </span>)</span><br /><br /><span class="st">Where n is the length of the set.</span><br /><br /><span class="st">Or more simply:</span><br /><span class="st">Standard Deviation / Mean</span><br /><br /><span class="st">(Doesn't Mathematical notations just freak you out sometimes?)</span><br /><br /><span class="st">The smaller the Coefficient of Variation, the more accurate it is. Generally, CoV below 1 is considered Low Variance. However, the CoV cannot be used for <b>Confidence Intervals</b>.</span><br /><span class="st"><br /></span><span class="st">Let's look at some of the plotting functions available in pylab. One way to observe distribution is through the use of a histogram. The following function allows plotting of histograms:</span><br /><br /><span class="st"><code>def flipCoinHistogramTrial(flips,trials):<br /> trialsList=[]<br /> for i in range(trials):<br /> trialsList.append(mean(flipCoinList(flips)))<br /> pylab.hist(trialsList,bins=25)<br /> pylab.xlabel("Mean of Flips")<br /> pylab.ylabel("Occurrence")<br /> pylab.title("Histogram of "+str(trials)+" Trials of "+str(flips)+" flips")<br /> pylab.show()</code></span><br /><span class="st"></span><br /><span class="st">Basically the following code would flip coins "flips" number of times for "trials" times. It then plots the results on a histogram. As you can see, the diagram depicts a <b>Normal Distribution</b>.</span><br /><br /><img src="http://img6.imageshack.us/img6/4706/5xq1.png" /><br /><br />As we talk about the Standard Deviation and Coefficient of Variation, we are not only interested about the mean but also about the distribution of the results. A perfect Normal Distribution peaks at the mean and falls off symmetrically on both side. The Normal Distribution is what we always call the <b>Bell Curve</b>.<br /><br />The Normal Distribution is frequently used to construct probabilistic models for two reasons:<br />1) It has good mathematical properties and it is easy to reason about<br />2) It has many naturally occurring instances in the world so most things map on it pretty well<br /><br />Mathematically, the Normal Distribution can be characterized by two parameters:<br />1) Mean<br />2) Standard Deviation<br /><br />If we have a Normal Distribution, the Mean and Standard Deviation can be used to compute Confidence Intervals.<br /><br />A Confidence Interval allows us to estimate an unknown parameter by providing a range that is likely to contain the unknown value and a Confidence Level that the unknown value lies within that range.<br /><br />An example of a Confidence Interval is:<br />1 <span class="st">± 0.05</span><br /><br /><span class="st">If the <b>Confidence Level</b> is not stated, it is assumed to be 95%. This means that 95% of the time, the results will lie between 0.95 to 1.05.</span><br /><span class="st"><br /></span><span class="st">An <b>Empirical Rule</b> gives us a handy way to estimate the Confidence Interval given the Mean and the Standard Deviation. It is as such for a true Normal Distribution:</span><br /><span class="st">68% of the data will be within 1 SD of the Mean</span><br /><span class="st">95% of the data will be within 2 SD of the Mean</span><br /><span class="st">99.7% of the data will be within 3SD of the Mean</span><br /><span class="st"><br /></span><span class="st">A trick used to estimate the Standard Deviation is the <b>Standard Error</b>. The Standard Error is an estimate of the SD with the assumption that the errors are normally distributed and that the Sampled Population is small relative to the Actual Population. This is usable because the Normal Distribution is often an accurate model of reality.</span><br /><br /><span class="st">If p = % of the sample that met a certain condition, and n = sample size, we can say:</span><br /><span class="st">Standard Error = (p*(100-p)/n)<sup>1/2</sup></span><br /><br />Say for example, out of 1000 voters, 60% of the voters voted for Kelvin. We can say that the Standard Error is:<br />(60*(100-60)/1000)<sup>1/2</sup><br />=(60*40/1000)<sup>1/2</sup><br />=2.4<sup>1/2</sup><br />=1.549193338%<br /><br />This means that with 95% confidence level, the true percentage of votes Kelvin would get is within 2 SE of 60%. In order words, the Confidence Interval would be:<br />60% <span class="st">± 3.10%</span><br /><span class="st"><br /></span><span class="st">We can test this result by creating a poll ourselves:</span><br /><br /><span class="st"><code>def poll(voters,percentage):<br /> votes=0.0<br /> for i in range(voters):<br /> if random.random()<percentage br=""> votes+=1.0<br /> return votes<br /><br />def testStandardError(n,p,trials):<br /> results=[]<br /> for i in range(trials):<br /> results.append(poll(n,p))<br /> pylab.hist(results,bins=100)<br /> pylab.xlabel("Votes Gathered")<br /> pylab.ylabel("Matching Trials")<br /> pylab.title("Standard Deviation is "+str(standardDeviation(results)/n*100))<br /> pylab.show()<br /><br />testStandardError(1000,60.0,1000)</percentage></code></span><br /><span class="st"><br /></span><span class="st">As you can see, the actual Standard Deviation gathered from a real test is actually quite close to what we might have gotten from the SE formula.</span><br /><span class="st"><br /></span><span class="st"><img src="http://imageshack.us/a/img59/5076/22r9.png" /></span><br /><span class="st"><br /></span><span class="st">This shows how good the Standard Error is as an estimation to the actual Standard Deviation.</span>Kelvinhttp://www.blogger.com/profile/08530586088329644767noreply@blogger.com0tag:blogger.com,1999:blog-6890826354654530731.post-35691655501684417442013-06-10T16:46:00.001+08:002013-06-10T16:52:07.131+08:00Computer Science 11We'll start today's article with an installation of <b>matplotlib </b>from here:<br /><a href="http://matplotlib.org/downloads.html">http://matplotlib.org/downloads.html</a><br /><br />Let's go through an example in graph plotting. The below example shows plotting of a graph that compounds annual interest of principal $100,000 for 50 years:<br /><br /><code>import pylab<br /><br />principal = 100000<br />interestAnnual = 0.05<br />years = 50<br />values = []<br />for i in range(years+1):<br /> values.append(principal)<br /> principal+=principal*interestAnnual<br />pylab.plot(values)<br />pylab.show()</code><br /><br />You should see a graph that bends upwards. Notice that even though we know exactly what we're looking at, because it's a simple graph and we know what the x and y axes represent, it may not be useful for other people we are showing this to.<br /><br />Therefore it is important that we include labelled axes. To label the axes, we use the following code before we use title(), xlabel() and ylabel() functions as shown:<br /><br /><code>import pylab<br /><br />principal = 100000.0<br />interestAnnual = 0.05<br />years = 50<br />pylab.title(str(interestAnnual)+" Growth of $"+str(principal)+" Compounded Annually over "+str(years)+" years")<br />pylab.xlabel("Years of Compounding")<br />pylab.ylabel("Value of Principal (SGD)")<br />values = []<br />for i in range(years+1):<br /> values.append(principal)<br /> principal+=principal*interestAnnual<br />pylab.plot(values)<br />pylab.show()</code><br /><br /><b>Randomness </b>is an integral part of problem solving, because much problems in the real world are based on random (or seemingly random) variables.<br /><br />Let's look at one of the simplest forms of probability: The Die. If a die is thrown 10 times, what is the probability of it not getting a 1 in ALL the throws?<br /><br />On the first throw, the probability of rolling a 1 is 1/6, and NOT rolling a 1 on the first throw is 5/6.<br /><br />The probability of not getting 1's on two rolls is (5/6)<sup>2</sup>.<br /><br />Therefore, the probability of not getting 1's at all in 10 rolls is (5/6)<sup>10</sup>.<br /><br />We can use the 1 minus rule to find the probability of getting AT LEAST ONE 1. All we have to do is to use 1-(5/6)<sup>2</sup>.<br /><br />Blaise Pascal, the same person that made important contributions to the study of fluids, vacuum and pressure, also founded the probability theory. His friend asked him a question: Rolling a pair of dice 24 times, how likely is it that we'll get a pair of 6's.<br /><br />Let's first look at the probability of rolling a pair of 6's in a single throw. Since the probability of getting a 6 on the first die is 1/6 and the probability of getting it on the second die is 1/6, there should be a 1/36 chance of rolling a pair of 6's on a single throw.<br /><br />Most people would go ahead and calculate a (1/36)<sup>24</sup>. However, what we're actually getting over here is the probability of getting ALL rolls to be pairs of 6's.<br /><br />To calculate the probability of getting AT LEAST ONE pair of 6's, we use the method found in the previous example. We first look at the probability of NOT rolling any pairs of 6's.<br /><br />On a single throw, this would be defined as 1-(1/36) which is 35/36.<br /><br />In 24 throws, we would have a chance of (35/36)<sup>24</sup> NOT encounter a pair of 6's, which equates to ~0.5086.<br /><br />If we inverse it, we would have a chance of 1-(35/36)<sup>24</sup> to roll AT LEAST ONE pair of 6's. We have a ~0.4914 chance to roll at least a pair of 6's.<br /><br />Let's write a quick simulation to calculate this probability:<br /><br /><code>import random<br /><br />def rollDie():<br /> return random.randint(1,6)<br /><br />def rollGame():<br /> for i in range(24):<br /> if rollDie()+rollDie()==12:<br /> return True<br /> return False<br /><br />def simulateGame(n):<br /> wins=0.0<br /> for i in range(n):<br /> if rollGame():<br /> wins+=1.0 <br /> return wins/n<br /><br />print(simulateGame(100000))</code><br /><br />This code essentially has a rollDie() function which simulates the rolling of 1 die.<br /><br />The rollGame() function rolls the die for 24 times and evaluates to True when a pair of 6's occurs and False when the pair does not occur.<br /><br />Finally, simulateGame() runs n number of games and finds out how many of those games did we actually get at least ONE pair of 6's.<br /><br />In this case, it seems counter-intuitive to have to write a simulation in order to solve this since it is pretty easy to find the probability of this through simple multiplication. However, simulation is important in solving really complex problems. It is also good for us to have at least 2 methods to prove our calculations.<br /><br />The above simulation we've used is also known as the <b>Monte Carlo Simulation</b> which is named after a casino. This method is known as an <b>Inferential Statistics</b> method. It is based upon the principle that a random sample tends to exhibit the same properties as the population from which it was drawn. In simpler terms, if I draw a sample of people from across the country, the average opinion of this sample would be roughly the same as the average of every single person in the country.<br /><br />In the Monte Carlo Simulation, the more trials you do, the more stable the value gets. In a sense, there is less jittery, and it gets closer to the actual value of the calculated probability. This is the <b>Bernoulli's Law</b>, or the <b>Law of Large Numbers</b>. This law applies when the tests are independent and does not depend on the results of the previous tests. As the number of trials tends to infinity, the output would converge to the value of the actual probability.<br /><br />There is a term known as the <b>Gambler's Fallacy</b> where in a game of coin flipping people would tend to bet on heads if tails came up many times in a row. However, the problem here is that since coins have no memory, and each process is independent of each other, the deviation of having many heads will not be evened out as most people think it might.<br /><br />Here's a demonstration of Bernoulli's Law:<br /><br /><code> import random<br />import pylab<br /><br />def flipCoin():<br /> return bool(random.randint(0,1))<br /><br />def exponentialTest(n):<br /> probability=[]<br /> numberOfFlips=[]<br /> for i in range(n):<br /> heads=0.0<br /> for j in range(2**i):<br /> heads+=float(flipCoin())<br /> probability.append(heads/(2**i))<br /> numberOfFlips.append(2**i)<br /> pylab.plot(numberOfFlips,probability,"o")<br /> pylab.semilogx()<br /> pylab.title("Demonstration of Bernoulli's Law")<br /> pylab.ylim(0.0,1.0)<br /> pylab.xlabel("Number of Flips")<br /> pylab.ylabel("Percentage of Heads")<br /> pylab.show()<br /><br />exponentialTest(20)</code><br /><br />This flips a coin 2<sup>0 to n</sup> times and displays the results in a table. The output demonstrates that as the number of trials increase, the results converge to the actual probability:<br /><img src="http://img689.imageshack.us/img689/2364/slaw.png" />Kelvinhttp://www.blogger.com/profile/08530586088329644767noreply@blogger.com0tag:blogger.com,1999:blog-6890826354654530731.post-21243065807912981022013-05-26T23:10:00.001+08:002013-06-16T01:58:08.990+08:00Computer Science 10A <b>Generator</b> in Python is <b>yield</b>. It is similar to <b>return</b>, but instead of just stopping, it actually remembers the point in the body where it was, plus all local variables. A Generator is a function, or method, that we can use as an Iterator. For every iteration, it goes back and continues the execution until another yield is met. It does so until it reaches the end of the function.<br /><br />The benefit of using a generator function as opposed to returning a whole is that it saves on memory requirements of having to store an entire list in memory. It also cuts down response time. Let us look at the following example:<br /><br /><code>def generateList(n):<br /> ans=[] <br /> for i in range(n):<br /> ans.append(n*2)<br /> return ans<br /><br />for i in generateList(1000000000):<br /> print(i) </code><br /><br />It actually does the same thing as:<br /><code>def generatorYield(n):<br /> for i in range(n):<br /> yield i*2<br /><br />for i in generatorYield(1000000000):<br /> print(i) </code><br /><br />However, the yield function uses much less memory as it does not have to store a full list in memory. It also has faster response time because it reacts upon the first yield, as compared to waiting for the entire list to be populated.<br /><br />This is not a perfect example, but it does illustrate some little principles and basic implementation of the yield function.<br /><br />For much of the history of Science, people focus on trying to find <b>Analytic Models</b>. An Analytic Model allows prediction of behavior through a function by having a set of initial conditions and parameters. This led to things like the calculus and the probability theories.<br /><br />However, Analytic Models do not work all the time. In the 20th century, people started turning to <b>Simulation Models</b>. When it comes to things like systems which are not mathematically tractable, like weather forecasting, instead of spending a lot of time to build accurate mathematical models, it's easier to build successively refining series of simulations. It is often easier to extract useful intermediate results than it is to build a detailed Analytic Model. In the past, people had to build really complex mechanical clockwork models to work with simulations. The advent of computers gave rise to the adoption of Simulation Models.<br /><br />A Simulation Model has the following properties:<br />1) <b>It gives useful information about behavior of a system</b>, as opposed to an Analytic Model which gives the exact outcome of the situation.<br />2) <b>It gives us an approximation to reality</b>.<br />3) <b>Simulation models are descriptive, instead of prescriptive</b>. A prescriptive model gives the exact same output for every input. In simulations, given a set of conditions, you may get different outcomes each time you run it because it allows factoring of things that cannot be mathematically modeled.<br /><br />We may not get the same answer every time but if we do the simulation enough times we're going to get a good sense of what is going to happen. <br /><br />In 1827, Robert Brown observed that a pollen particle suspended in water would float around randomly under microscopic conditions. They call it the Brownian Motion. The first clear mathematical model didn't really come about until in the 1900s. In 1905, Albert Einstein came up with the model that confirmed the existence of atoms.<br /><br /><a href="http://en.wikipedia.org/wiki/Brownian_motion">Brownian Motion in Wikipedia</a><br /><br />Brownian Motion is an example of a <b>Random Walk</b>. A Random Walk simulation is extremely useful when trying to model particles that are going to move in a random direction and velocity at each time-step. We can use it to model particles in fluid. This is also extremely useful in understanding biological processes. It is also useful in modeling stock markets.<br /><br />Suppose that we have a drunken person in the middle of a field, and each second he can take one step in any cardinal direction (NSEW). After a thousand seconds, where would he be?<br /><br />To model this, we're going to divide this field into x and y axes. Let's assume that the person starts of at (0,0), which is the origin.<br /><br />This is the probability table mapping steps, distance and probability:<br /><br />At the first step:<br />There is a probability of 1 that he can be 1 step away.<br />He would be about 1 step away. <br /><br />At the second step:<br />There is a probability of 1/4 that he can be back at 0,<br />a probability of 2/4 that he can be sqrt(2) steps away,<br />and a probability of 1/4 that he can be 2 steps away.<br />He would be about sqrt(2)/2+0.5=1.2 steps away<br /><br />At the third step:<br />There is a probability of 9/16 to be 1 step away,<br />3/8 to be sqrt(5) steps away,1/16 probability to be 3 steps away,<br />He would be about 9/16+3/16+(3sqrt(5)/8)=1.5 steps away.<br /><br />As the number of steps increase, it becomes extremely difficult to calculate where he will end up at.<br /><br />From here, it seems like the more steps the person takes, the further away he will be.<br /><br />To simulate this, we need to model a person, a field, and a location to help with calculations.<br />The field object can be a collection of drunks and their location.<br /><br />This is how the entire code looks like:<br /><br /><code>import math<br />import random<br />import copy<br /><br />class Location(object):<br /> def __init__(self,x,y):<br /> self.x=float(x)<br /> self.y=float(y)<br /><br /> def set_location(self,xy):<br /> self.x=float(xy[0])<br /> self.y=float(xy[1])<br /><br /> def move(self,xy):<br /> self.x+=float(xy[0])<br /> self.y+=float(xy[1])<br /> <br /> def get_location(self):<br /> return (self.x,self.y)<br /><br /> def distance_from(self,other):<br /> return math.sqrt((other.x-self.x)**2+(other.y-self.y)**2)<br /><br /> def __str__(self):<br /> return str((self.x,self.y))<br /><br />class Field(object):<br /> def __init__(self):<br /> self.field={}<br /><br /> def add_person(self,person,location):<br /> if person not in self.field.keys():<br /> self.field[person]=copy.deepcopy(location)<br /><br /> def rem_person(self,person):<br /> if person in self.field.keys():<br /> self.field.pop(person)<br /><br /> def move_person(self,person):<br /> if person in self.field.keys():<br /> self.field[person].move(person.step())<br /><br /> def where_is(self,person):<br /> if person in self.field.keys():<br /> return self.field[person]<br /> <br />class Person(object):<br /> def __init__(self,name):<br /> self.name=name<br /><br /> def step(self):<br /> return random.choice(((0,1),(1,0),(0,-1),(-1,0)))<br /><br /> def __str__(self):<br /> return self.name</code><br /><br />We can then implement a test like this:<br /><br /><code>origin=Location(0,0)<br />kelvin=Person("Kelvin")<br />field=Field()<br />field.add_person(kelvin,origin)<br />tries=1000<br />steps=1000<br />average_distance=0.0<br />maximum_distance=0.0<br />minimum_distance=float(steps)<br />for i in range(tries):<br /> for j in range(steps):<br /> field.move_person(kelvin)<br /> distance_moved=origin.distance_from(field.where_is(kelvin))<br /> average_distance+=distance_moved/float(tries)<br /> maximum_distance=max(maximum_distance,distance_moved)<br /> minimum_distance=min(minimum_distance,distance_moved)<br /> origin.set_location(field.where_is(kelvin).get_location())<br />print("Average Travelled:",average_distance)<br />print("Maximum Travelled:",maximum_distance)<br />print("Minimum Travelled:",minimum_distance)</code><br /><br />As you can see, the results vary quite a lot and in order to get a proper reading we have to set tries to a really high value.<br /><br />How do we think about the results of the programs when the programs themselves are stochastic? Almost everything in the real world is stochastic, and we must be really careful in accepting results. Let's think about the role of randomness in computation. The world is extremely non-deterministic. That means that we can only say that "something has a very high probability of occurring", and there is no such thing as "something will certainly occur".<br /><br /><b>Causal Non-Determinism</b> is the believe that not every event is caused by previous events. Einstein and Schrodinger strongly disagreed with this. Instead, they strongly believed in <b>Predictive Non-Determinism</b>. The concept is that because of our inability to make accurate measurements about the physical world, it makes it impossible to make precise predictions about the future. Things are not unpredictable, it's just because we just don't / can't know enough to be able to predict it.<br /><br /><b>Stochastic Process</b> is a process whose next state is based on the previous state and at least one random element.<br /><br />Random in Python can be implemented via:<br />1)<b> random.choice()</b> - Helps you choose randomly based on what's passed to the function.<br />2)<b> random.random()</b> - Generates a number that is 0.0 < x <= 1.0.<br /><br />A process that returns a value between 1 to 6 (a dice roller) is not considered Stochastic as the value returned is independent upon the previous values. In 10 throws, the probability of getting all 1's is:<br />1/(6<sup>10</sup>)<br /><br />In fact, the probability of getting all 0's is also 1/(6<sup>10</sup>), as is any other combination (e.g. 1, 3, 2, 5, 3, 4, 3, 2, 4, 2).<br /><br />Looking from the probability perspective, for such a process, we should be asking: What fraction of the possible results have the property we are testing for (e.g. have exactly four 1's, etc).<br /><br />From the other side, the probability of us getting NOT all 1's is:<br />1-1/(6<sup>10</sup>)<br /><br />This "1 minus" trick will come in very handy.<br /><br />A lot of times when we're doing computing, we use print statements to present our results. When it comes to decimal places or when dealing with large amounts of data, print statements can be quite messy and difficult to read.<br /><br />In Python, there is a plotting package known as <b>pylab</b>. It provides quite a lot of facilities from Matlab. We can use pylab for plotting graphics very easily.<br /><br />However, pylab is not part of the standard Python library, and therefore we need to install it.<br /><br />Head on over <a href="http://www.scipy.org/Download">here</a> and install one of the binaries compatible with your computer.<br /><br />The next article will discuss more about pylab. Kelvinhttp://www.blogger.com/profile/08530586088329644767noreply@blogger.com10tag:blogger.com,1999:blog-6890826354654530731.post-34960649509046970222013-05-26T18:11:00.000+08:002013-06-16T01:58:49.368+08:00Computer Science 09<b>Objection-Oriented Programming</b> (OOP) is actually not a new notion. However, OOP didn't take off until Java, which was the first popular OOP language. The basis of OOP is the abstraction of data types. The idea is that we can extend our programming languages through user-defined types. These types can then be used just as though it's a built-in type.<br /><br />The reason why it's called "<b>Abstract</b>" data type is because for each type there exists an <b>Interface</b> for which we can <b>Implement</b>. Interfaces explain what the methods do (but not how they do it). The <b>Specification</b> of an object tells us what the object does.<br /><br />To define a <b>Class</b>, we use:<br /><br /><code>class class_name(object):<br /> #Contents of the Class </code><br /><br />We create the <b>Instantiator</b> as such:<br /><br /><code>def __init__(self):<br /> #Contents of the instantiator </code><br /><br />Every time the object is created, the __init__ method will be executed first. We usually introduce attributes of objects through its Instantiator like this:<br /><br /><code>def __init__(self):<br /> self.a=10<br /> self.L=[]</code><br /><br />We then create the Object in the main code as such:<br /><br /><code>object_name = class_name()</code><br /><br />Notice that there is a notion of <b>self</b> in the implementation. In __init__, self is a formal parameter, but when creating the object, we did not actually pass any parameter in.<br /><br />This is because for every method in an object, the reference pointer of itself is automatically passed into the first parameter. Therefore, you'll need to include self (or anything you name) in every method's first formal parameter. The pointer in object_name and the self object inside __init__ is actually the same. <br /><br />Integer a and list L are attributes of the instance of the class_name object, which is equivalent to the self object.<br /><br /> To see the pointer of the object, we can print object_name directly, and this is what we'll see:<br /><br /><code><__main__.class_name object at 0x0000000002C27B38></code><br /><br />This is also exactly what we'll see if we printed self inside the methods of the object.<br /><br />Let's say that if we print object_name, we want to see a custom string representation of it, instead of its contents. We can do this by creating the __str__ method:<br /><br /><code>def __str__(self):<br /> return "The value of a is: "+str(self.a)+"\nThe contents of L is "+str(self.L)</code><br /><br />The print function automatically calls the __str__ method when attempting to print object_name.<br /><br />If we want to let the value of a represent object_name when type-casted to an integer, we can simply create a method as such:<br /><br /><code>def __int__(self):<br /> return self.a</code><br /><br />We can test by using:<br /><br /><code>print(10+int(object_name))</code><br /><br />We can actually use class_name.a to refer to the "a" attribute in the object. However, this is not recommended. This is because "a" is a variable used in the implementation of the object. It is not in the specification that "a" even exists.<br /><br />Take for instance, you have a method get_number() which returns the value of a certain number stored in variable "a". In a future version, let's say that the variable "a" doesn't exist anymore, because the implementation is different and uses perhaps variable "x" instead, to store that number.<br /><br />If our programs used to reference "a", you would get a whole load of errors. However, if it used get_number(), it doesn't matter whether "a" or "x" is used, because it'll depend what's returned by the get_number() method. We still get the number we want.<br /><br />The concept is known as <b>Data Hiding</b>, in which we abstract data into <b>Getter and Setter</b> methods. It is important that we do not directly access instance (created each instance of the class) and class variables (shared among all instances of the class), due to object reference problems and possibility of changes in the future.<br /><br />Take for instance, we want to write a program to keep track of staff information across various departments of a company. To do this, we need to start thinking about the level of abstraction we want to have. Before we write the code in detail, we want to think about the types that would make it easier to write the code.<br /><br />When dealing with stuff like these, in the end we don't want to deal with lists and dicts and floats. Instead, we want to keep track of things like department, worker, manager, etc.<br /><br />We can then set up a system of hierarchy of these. We first identify the similarities between the objects. For example, worker and manager are all employees. All employees have attributes like name, birth date, etc. We can set up the employees class as shown:<br /><br /><code>class Employee(object):<br /> import datetime<br /> <br /> def __init__(self,name):<br /> nameSplit=name.split()<br /> self.firstName=nameSplit[0]<br /> try:<br /> self.lastName=nameSplit[1]<br /> except:<br /> self.lastName=""<br /> self.birthdate=None<br /><br /> def set_firstName(self,firstName):<br /> self.firstName=firstName<br /><br /> def get_firstName(self):<br /> return self.firstName<br /><br /> def set_lastName(self,lastName):<br /> self.lastName=lastName<br /><br /> def get_lastName(self):<br /> return self.lastName<br /><br /> def set_birthdate(self,date):<br /> self.birthdate=date<br /><br /> def get_birthdate(self):<br /> return self.birthdate.isoformat()<br /><br /> def get_age(self):<br /> return round((datetime.date.today()-self.birthdate).days/365.0,2)<br /><br /> def __lt__(self,other):<br /> #If they have the same first names<br /> if self.firstName==other.firstName:<br /> #Compare their last names<br /> return self.lastName<other .lastname="" br=""> else:<br /> #Else compare by their first names<br /> return self.firstName<other .firstname="" br=""><br /> </other></other><br /><other .lastname="" br=""><other .firstname="" br=""> def __str__(self):<br /> return self.firstName+" "+self.lastName</other></other></code><br /><br />Notice that there's a __lt__ method in there. The __lt__ method is what's called when you compare employee with something else. For example, if you have two employees, john and william, typing john < william would invoke __lt__(john,william).When sorting lists of employees, the __lt__ automatically gets used when comparing objects. Observe the following implementation in the main code:<br /><br /><code>listEmployees=[]<br />listEmployees.append(Employee("John Smith"))<br />listEmployees.append(Employee("William Smith"))<br />listEmployees.append(Employee("John Doe"))<br />listEmployees.append(Employee("William Wise"))<br /><br />print("Before sort:")<br />for employeeInList in listEmployees:<br /> print(employeeInList)<br /><br />print("After sort:")<br />listEmployees.sort()<br /><br />for employeeInList in listEmployees:<br /> print(employeeInList)</code><br /><br />It will be sorted according to first name, then last name, as implemented in the __lt__ method.<br /><br />We can also play with the birthdate and age as shown:<br /><br /><code>def isOlder(employee1,employee2):<br /> if employee1.get_age()<employee2.get_age():<br />print(employee2,"is older")<br /> else:<br /> print(employee1,"is older")<br /><br />import datetime<br /><br />kelvin=Employee("Kelvin Ang")<br />kelvin.set_birthdate(datetime.date(1990,6,21))<br /><br />ling=Employee("Lingling Pang")<br />ling.set_birthdate(datetime.date(1993,10,19))<br /><br />print(kelvin,"is",kelvin.get_age())<br />print(ling,"is",ling.get_age())<br /><br />isOlder(kelvin,ling)</code><br /><br />As we can see, employee is a subclass of object, that's why we can do so much stuff with it. A worker is a subclass of employee. For example, in the company, every worker has a workerID. We can then do:<br /><br /><code>from Employee import Employee<br /><br />class Worker(Employee):<br /> workerID=0<br /> def __init__(self,name):<br /> Employee.__init__(self,name)<br /> self.workerID=Worker.workerID<br /> Worker.workerID+=1<br /><br /> def __str__(self):<br /> return Employee.__str__(self)+", Worker ID "+str(self.workerID)<br /><br /> def __lt__(self,other):<br /> return self.workerID<other .workerid="" code=""> </other></code><br /><br />We can then create a worker as such:<br /><br /><code>kelvin = Worker("Kelvin Ang")<br />print(kelvin)</code><br /><br />(Note that I saved the Employee class in a file called Employee.py. I also didn't create the Getter and Setter methods for workerID, which should be implemented but had been left out in the interest of simplicity)<br /><br />Notice that even though we inherited all methods from Employee class, we defined the __init__ method again. This is known as <b>Method Overriding</b>. When we Override a method, and we want its Superclass's __init__ to be run as well, we specify it in the Overriding Method.<br /><br />Similarly, we Override the __str__ method to now include the Worker ID. We append the old output with our new information. If we didn't override __str__, it would use the superclass's __str__ method.<br /><br />Notice that there's a workerID=0 statement at the top. This defines a <b>Class Variable</b>. A Class Variable is one that is shared among all objects of the same class. In this case, the first object instantiated gets assigned ID 0, and the next one gets assigned ID 1 and so on.<br /><br />We can then try it out like this:<br /><br /><code>kelvin = Worker("Kelvin Ang")<br />ling = Worker("Lingling Pang")<br />print(kelvin)<br />print(ling)<br />print(kelvin<ling)</code><br /><br />Note that we're now comparing the ID when we do kelvin<ling. Suppose that we want to compare names instead. We can use:<br /><br /><code>print(Employee.__lt__(kelvin,ling))</code><br /><br />We, however, cannot use Worker.__lt__() for two employees, because it makes use of workerID which employees do not have.<br /><br />We can also create another Class that is exactly the same as Worker. Like this:<br /><br /><code>class Worker2(Worker):<br /> pass</code><br /><br />It may seem as though it's for nothing, but it allows us to do checking like this:<br /><br /><code>if type(kelvin)=='Worker'<br /> #Some code here <br />elif type(kelvin)=='Worker2'<br /> #Some other code here</code>Kelvinhttp://www.blogger.com/profile/08530586088329644767noreply@blogger.com0tag:blogger.com,1999:blog-6890826354654530731.post-32531981588222390922013-05-23T23:49:00.002+08:002013-06-20T20:24:44.203+08:00Computer Science 08Hashing is how the 'dict' is implemented in Python. It trades off space (memory) for time (CPU utilization). Take for example, we have a set of integers. We build the set of integers, and detect whether or not a particular integer is in the set very quickly.<br /><br />We are first going to take an integer i, and hash it. A hashing function is one that converts integer i to some other integer in a range between 0 to k. We will then use this hash integer to index into a list of buckets (a bucket is actually a list).<br /><br />We know that we can find the i<sup>th</sup> element of a list in constant time, so to find an integer in a dictionary, we hash the integer, then look directly into the index to look for it. The hash function has the following issues:<br />1) <b>Many-to-One</b> - There can be an infinite number of different inputs that may hash to the same value, resulting in <b>Collisions</b>.<br />2) <b>Collision</b> - When two different values hash to the same bucket, we have a collision.<br />3) <b>Linear Rehashing</b> is used to take care of Collision, which keeps a list of things with the same hash value in buckets.<br /><br />A good hash function has the following properties:<br />1) <b>Wide Dispersal, One-to-One</b> - This requires a large hash size<br />2) <b>Avalanche Effect</b> - A small change in input should result in a large change in hash<br /><br />If we analyze the complexity of a <b>Hash Lookup</b>, we know that the hashing is of constant complexity, while the lookup is actually based on the number of things stored in the Bucket. If the particular item you're looking for has a great number of other things that collided with it, then it will take a longer time to find it. If all buckets are of length 1 (no collision), then a hash lookup is actually a constant complexity operation. The more buckets there are, the less chance is it for there to be a collision. This is a type of trade off between space (take more space) and time (use less time).<br /><br />A practical hash table uses a large hash such that there is really low collision.<br /><br />Python's dictionary implements an expanding hash system that expands accordingly to the number of collisions detected.<br /><br />The reason why the keys in a 'dict' has to be immutable is because if a list is used as a key, and it mutates somewhere along the way, the hash would come out different and we would not be able to find the object it's supposed to be the key for.<br /><br /><b>Exceptions</b> are everywhere in programming languages. What an exception is an error that is encountered during execution. Examples are:<br />1) <b>IndexError</b> - When you try to access an invalid index<br />2) <b>NameError</b> - When you try to access an invalid variable<br />3) <b>TypeError</b> - When an unexpected type is encountered<br /><br />These Exceptions are also called <b>Unhandled Exceptions</b>. In Unhandled Exceptions, the program crashes. Sometimes we write programs that intentionally raise exceptions, and we write code to catch these exceptions and handle them. To handle these exceptions, we use the <b>try</b> and <b>except</b> blocks:<br /><br /><code>try:<br /> #Code that potentially can raise exception here<br />except:<br /> #Things that run when an exception is caught<br /><br />Let us look at the following code:<br />def readVal(valType, msgRequest, msgError):<br /> tries=0<br /> while tries<4<br /><br /> val = input(msgRequest)<br /> try:<br /> val=valType(val)<br /> return val<br /> except:<br /> print(msgError)<br /> tries+=1<br /> raise TypeError('Number of tries exceeded')<br /><br />print(readVal(int,"Please enter an int:","You did not enter an int.")) </code><br /><br />We can actually catch specific exceptions raised by an exception like this:<br /><br /><code>try:<br /> print(readVal(int,"Please enter an int:","You did not enter an int.")) <br />except TypeError as tE:<br /> print(tE)</code><br /><br />In this case, we can see that the <b>tE </b>variable stores the <b>TypeError Exception</b>. We can directly print the tE variable to display its message. If we do not specify the Exception type then all Exceptions are caught by the statement. We can also specify multiple except statements to represent the different cases of Exceptions.<br /><br />The notion of the <b>Class </b>is what distinguishes a modern programming language from a primitive one.<br /><br />A <b>Module </b>is a collection of related functions. An example of a Module is <b>math</b>, which we import by using "import math". From there, we can then use the log function stored in the math module using <b>math.log()</b>. The <b>Dot-Notation</b> is used to <b>Disambiguate</b> names. For example, item1.x and item2.x are different x's.<br /><br />A Class is similar to a Module in that it is a collection of functions, but it also contains data on which its functions operate on. It allows us to pass a whole group of functions and data around. This is the basis of what we call <b>Object-Oriented Programming</b>. An example of a Class is the list, which contains functions like append(). The data and function associated with the object is called an object's attributes. The data, function, and everything, including the class itself, is an object.<br /><br />When we deal with objects, there is a <b>Message Passing Metaphor</b>. The basic metaphor is that when we write something like L.append(e), we are actually passing the message "append(e)" into the L object, and it is executed within its own scope. A <b>Method</b>, such as append(), is a Function associated with an Object. A Class is a collection of objects with identical characteristics that <b>forms a type</b>. Lists and Dictionaries are actually built-in classes.Kelvinhttp://www.blogger.com/profile/08530586088329644767noreply@blogger.com2tag:blogger.com,1999:blog-6890826354654530731.post-3010912720447169852013-05-23T18:19:00.001+08:002013-06-16T01:59:39.850+08:00Computer Science 07The oldest kind of list in the world is the <b>Linked List</b>. In a Linked List, each element has a pointer to the next element in the list. In this case, finding the i<sup>th</sup> element is O(i) as it has to traverse to the next item to find the next, and so on. In a system like this, Binary Search doesn't work very well.<br /><br />In Python, however, lists are differently implemented in a system of pointers. Let us suppose that a pointer in a list occupies 4 units of memory. The definition of unit is arbitrary, as is the number 4 arbitrarily chosen. To get to the i<sup>th</sup> object of a list of 'int's, we access the object pointed by the Pointer stored in:<br /><br /><code>start+(i*n)</code><br /><br />We know exactly where to get the pointer to get to the actual object, instead of having to traverse each of them individually till we find it. In Python's implementation, and in fact all OOP (<b>Object-Oriented Programming</b>) languages, the lists require constant time to reach, which is defined as Q(1).<br /><br />This implementation is called <b>indirection</b>. It is known that most problems we look at can be solved by adding layers of indirection. The only kind of problems that cannot be solved with indirection is when the levels of indirection is too great that it uses up the available memory.<br /><br />Binary Search is a very powerful tool, but it works only when the assumption that the list is sorted is true. In order to perform Binary Search, the process is as shown:<br />1) Sort L<br />2) Perform Binary Search on L<br /><br />We know that Binary Search is definitely O(log len(L)). In order for the above to make sense, we have to make sure that Sort L takes less than O(len(L)). However, there currently exists no sorting algorithm that could sort an element in sub-linear time. Therefore, if a list is unsorted, we will always take at least O(len(L))+O(log len(L)) time.<br /><br />However, there are still benefits to using Binary Search after sorting an unsorted list, as opposed to Linear Search. We are interested in what's called the <b>Anortized Complexity</b>. The idea here is that if we can sort the list once, we can perform the search many times. Therefore, if the number of searches performed on the list is a great number, the Order of Growth is:<br /><br /><code>lim k->inf (O(len(L))+k*O(log len(L))/k)<br />= k*O(log len(L))/k<br />= O(log len(L))</code><br /><br /><a href="http://www.youtube.com/watch?v=k4RRi_ntQc8&hl=en-GB&gl=SG">Obama and Bubble Sort</a><br /><br />A very simple sort is the Selection Sort. It is not a very good way to sort, but it is a useful method. In Selection Sort, like most other algorithms, it depends upon maintaining an <b>Invariant</b>. An Invariant is something that is invariantly true. The invariant we are going to maintain is a pointer into the list, which divides the list into a <b>Prefix</b> and a <b>Suffix</b>. The requirement is that the Prefix is assumed to be always sorted. This requirement is known as the Invariant. We'll start with an empty Prefix. Each step through the Algorithm, we decrease the size of the Suffix by one element, while increasing the Prefix by one element. We'll be done when the size of Suffix is 0.<br /><br />In Selection Sort, suppose that we have the following objects in the list:<br />| 3 5 6 7 8 5 4 3 2<br /><br />The | is the invariant pointer. The element looks through everything at the right side (the Suffix) and swaps the smallest element with the next element, then increments the invariant pointer:<br /><br />2 | 5 6 7 8 5 4 3 3<br /><br />It then looks for the next smallest element and swaps it with the next element:<br /><br />2 3 | 6 7 8 5 4 5 3<br /><br />This will continue on as shown:<br /><br />2 3 3 | 7 8 5 4 5 6<br />2 3 3 4 | 8 5 7 5 6<br />2 3 3 4 5 | 8 7 5 6<br />2 3 3 4 5 5 | 7 8 6<br />2 3 3 4 5 5 6 | 8 7<br />2 3 3 4 5 5 6 7 | 8<br />2 3 3 4 5 5 6 7 8 |<br /><br />The following code implements a simple Selection Sort:<br /><br /><code>def selectionSort(L):<br /> invariant=0<br /> minValue=L[invariant]<br /> minPointer=invariant<br /> for i in range(len(L)):<br /> for i in range(invariant,len(L)):<br /> if (L[i] < minValue):<br /> minValue=L[i]<br /> minPointer=i<br /> temp=L[invariant]<br /> L[invariant]=minValue<br /> L[minPointer]=temp<br /> invariant+=1<br /> if (invariant<br /><br /> minValue=L[invariant]<br /> minPointer=invariant<br /> return L<br /><br />L=[7,4,3,7,3,5,2,6,5,3,1]<br />print(selectionSort(L))</code> <br /><br />In this case, here's the breakdown of the list as it loops through the steps: <br />[1, 4, 3, 7, 3, 5, 2, 6, 5, 3, 7]<br />[1, 2, 3, 7, 3, 5, 4, 6, 5, 3, 7]<br />[1, 2, 3, 7, 3, 5, 4, 6, 5, 3, 7]<br />[1, 2, 3, 3, 7, 5, 4, 6, 5, 3, 7]<br />[1, 2, 3, 3, 3, 5, 4, 6, 5, 7, 7]<br />[1, 2, 3, 3, 3, 4, 5, 6, 5, 7, 7]<br />[1, 2, 3, 3, 3, 4, 5, 6, 5, 7, 7]<br />[1, 2, 3, 3, 3, 4, 5, 5, 6, 7, 7]<br />[1, 2, 3, 3, 3, 4, 5, 5, 6, 7, 7]<br />[1, 2, 3, 3, 3, 4, 5, 5, 6, 7, 7]<br /><br />For a problem of length 11, we actually went through the middle loop 66 times. If the list is length 12, it would go through 78 times. The order of growth for this problem is:<br />O((n(n+1)/2))<br /><br />If we take away any multiplicative and additive constants, we get:<br />O(n<sup>2</sup>)<br /><br />For some time, people are actually unsure if we could sort this list better. This is a form of the Divide and Conquer algorithm. For a Divide and Conquer algorithm, we:<br />1) Choose a Threshold Input Size, which is n<sub>0</sub> which is the smallest problem<br />2) How much are we going to divide the problem<br />3) Combine the sub-solutions back into the main solution<br /><br />John Von Neuman observed that given two <u>sorted</u> list, we can merge them quickly. Here's an example of two lists:<br />1=[<b>1</b>,3,5,6,9]<br />2=[1,7,8,10,11,12]<br />3=[]<br /><br />We first compare the first elements of each side. We would take smaller (or first, if both are the same) and put it to the merged list like this:<br />1=[3,5,6,9]<br />2=[<b>1</b>,7,8,10,11,12]<br />3=[1]<br /><br />The process goes on like this:<br /><br />1=[<b>3</b>,5,6,9]<br />2=[7,8,10,11,12]<br />3=[1,1]<br /><br />1=[<b>5</b>,6,9]<br />2=[7,8,10,11,12]<br />3=[1,1,3]<br /><br />1=[<b>6</b>,9]<br />2=[7,8,10,11,12]<br />3=[1,1,3,5]<br /><br />1=[9]<br />2=[<b>7</b>,8,10,11,12]<br />3=[1,1,3,5,6]<br /><br />1=[9]<br />2=[<b>8</b>,10,11,12]<br />3=[1,1,3,5,6,7]<br /><br />1=[<b>9</b>]<br />2=[10,11,12]<br />3=[1,1,3,5,7,8]<br /><br />1=[]<br />2=[<b>10,11,12</b>]<br />3=[1,1,3,5,7,8]<br /><br />1=[]<br />2=[]<br />3=[1,1,3,5,7,8,10,11,12]<br /><br />In this case, the maximum number of copies is Linear, while the maximum number of comparisons we are going to make is dependent on the total lengths of the lists, which is also Linear. We write this by:<br /><br /><code>O(len(L))+O(len(L))</code><br /><br />Removing all multiplicative constant, this is the time taken to merge two lists:<br /><br /><code>O(len(L))</code><br /><br />We can merge in Linear time. <br /><br />To implement mergesort, we break up the list into a bunch of lists of length one, then we start merging them into sorted lists.<br /><br />Since each merge is O(n) where n is the length of the list, and we call merge log<sub>2</sub>n depths, we end up with:<br /><br /><code>O(n log<sub>2</sub>n)</code><br /><br />Mergesort is Logarithmic, and therefore a little slower than Linear, but a lot faster than Quadratic.<br /><br />Here is the source code for a merge sort function:<br /><br /><code>def merge(L1,L2,compare):<br /> #Initialize an empty answer list<br /> ans=[]<br /> #Initialize positional counters<br /> i,j=0,0<br /> #As long as there are things to compare (i.e. It will run till the shorter list runs out)<br /> while i<len(L1) and j<len(L2):<br /> #Append the smaller of the two to the answer list<br /> if compare(L1[i],L2[j]):<br /> ans.append(L1[i])<br /> i+=1<br /> else:<br /> ans.append(L2[j])<br /> j+=1<br /> #Append the remaining list to the answer list<br /> while i<len(L1):<br /> ans.append(L1[i])<br /> i+=1<br /> while j<len(L2):<br /> ans.append(L2[j])<br /> j+=1<br /> #Return the list<br /> return ans<br /><br />def mergeSort(L,compare):<br /> #If list is length of 1, return it<br /> if len(L)<2:<br /> return L<br /> else:<br /> #If list is longer than 1<br /> #Find the middle of the list<br /> middle=int(len(L)/2)<br /> #Attempt to mergeSort it further<br /> L1=mergeSort(L[0:middle],compare)<br /> L2=mergeSort(L[middle:len(L)],compare)<br /> #L1 and L2 should be sorted by now<br /> return merge(L1,L2,compare)<br /><br />L=[2,4,5,6,8,12,12,12,1,3,5,6,7,13]<br />print(mergeSort(L,compare=lambda x,y:x<y))</code> <br /><br />L<len and="" br="" j="" len=""><len br=""><len br="">et us look at the base case for the mergeSort function:</len></len></len><br /><len and="" br="" j="" len=""><len br=""><len br="">If the list is length of 1, it is considered sorted</len></len></len><br /><br /><len and="" br="" j="" len=""><len br=""><len br="">Next, we look at the recursive case:</len></len></len><br /><len and="" br="" j="" len=""><len br=""><len br="">If the list is more than length of 1, break them in half sort them, then merge them and return the answer.</len></len></len><br /><br /><len and="" br="" j="" len=""><len br=""><len br="">At the point where mergeSort(L) where len(L)==2, it would be broken into L1 and L2 of list size 1 each. Next, it is merged and returned into its previous caller's L1 or L2.</len></len></len><br /><br /><len and="" br="" j="" len=""><len br=""><len br="">The previous caller would then see sorted size 2 lists of L1 and L2 returned, and it would in turn try to merge them and return them, to its previous caller's L1 or L2.</len></len></len><br /><br />Next, the previous caller would get sorted size 4 lists in its L1 and L2, and it will attempt to merge them.<br /><br />Now, you would ask, what happens if the list is not a power of 2? For example, if the list size is 5:<br />1)The first iteration would call mergeSort(L[0:2]) and mergesort(L[2:5]).<br /> a)The first level 2 iteration would call mergeSort(L[0]) and mergeSort(L[1])<br /> i)The first level 3 iteration would return the 1 length list.<br /> ii)The second level 3 iteration would return the 1 length list.<br /> a)The first level 2 iteration would merge the two sorted lists and return it.<br /> b)The second level 2 iteration would call mergeSort(L[0]) and mergeSort(L[1:2])<br /> i)The first level 3 iteration would return the 1 length list.<br /> ii)The second level 3 iteration would call mergeSort(L[0]) and mergeSort[L[1]<br /> (1)The first level 4 iteration would return the 1 length list.<br /> (2)The second level 4 iteration would return the 1 length list.<br /> ii)The second level 3 iteration would merge the two lists and return it.<br /> b)The second level 2 iteration would merge the two sorted lists and return it.<br />1)The first iteration would merge the two sorted lists and return it.<br /><len and="" br="" j="" len=""><len br=""><len br=""><len and="" br="" j="" len=""><len br=""></len></len><!--2:--></len></len></len> Kelvinhttp://www.blogger.com/profile/08530586088329644767noreply@blogger.com1tag:blogger.com,1999:blog-6890826354654530731.post-58851685609597607772013-05-23T03:12:00.001+08:002013-06-16T02:00:18.221+08:00Computer Science 06Many people think about how to make their program work. However, to be an excellent Computer Scientist, we must learn to make programs work efficiently.<br /><br />We learnt that Brute Force can solve almost every problem. However, Brute Force can sometimes be really impractical especially when it comes to the length of time required to complete something.<br /><br />Efficiency is mostly about algorithm, and not rarely about the tiny details in the code. Creation of algorithm usually has the goal of reducing a problem to a previously solved problem. This process is known as <b>Problem Reduction</b>.<br /><br />Efficiency is classified in two dimensions:<br />1) Space<br />2) Time<br /><br />When we talk about Space and Time, we can often trade one for the other. What we are talking about is Memory and CPU Utilization. These days, we focus on Time, because Space is usually in abundance to a degree.<br /><br /><b>Computational Complexity</b> is influenced by:<br />1) Speed of the machine<br />2) Cleverness of programming language version<br />3) Input<br /><br />To talk about Computational Complexity, we count the number of basic steps. An ideal function to calculate the Computational Complexity would be:<br />T(x)=n, where n is the number of steps taken for an input x.<br /><br />A step is an operation that takes constant time and is predictable. An example of a step is an assignment, comparison, array index, and so on.<br /><br />Our modern computers are <b>Random Access Machines</b>. Instructions are executed <b>Sequentially</b> and we assume <b>Constant Time</b> required to access memory. Primitive Computers used things like tape for memory and this model cannot be mapped on it. Modern Computers somewhat maps well to this model, but if we look in detail, because of <b>Memory Hierarchy</b>, it may take longer for some memory to be accessed. However, generally, the <b>Random Access Model</b> works.<br /><br />For each algorithm, we have a best case, and a worst case. For example, in <b>Linear Search</b>, where a list is searched from the first to the last element sequentially. The <b>Best Case</b> for Linear Search is when the item is found in the first index. The <b>Worst Case</b> is when the item does not exist in the Linear Search. We also look at the <b>Expected Case</b> which is the average, but it is the hardest to talk about because we need to know exactly how the elements are sorted.<br /><br />In Computational Complexity Analysis, we usually look at the Worst Case.<br /><br />The Worst Case provides an upper bound. Once we know the upper bound, we cannot be surprised. Even though it seems pessimistic to think about the Worst Case, the Worst Case happens really often.<br /><br />Let us analyze a function that computes factorial:<br /><br /><code>def factorial(n):<br /> assert n >= 0<br /> answer = 1<br /> while n > 1:<br /> answer*=n<br /> n-=1<br /> return answer</code><br /><br />The command assert is considered a step, while initialization of answer is another step. When it enters the loop, the conditional test is a step, and the two operations in it are considered two steps. The loop runs for n times. Finally, the return statement is considered a step. We can derive the Computational Complexity as:<br /><br /><code>2+3*n+1</code><br /><br />However, when we have a large input, we can ignore additive constants. We will end up with the following formula:<br /><br /><code>3*n</code><br /><br />When we look at the Computational Complexity, we are concerned with growth with respect to size. We can actually ignore multiplicative constants:<br /><br /><code>n</code><br /><br />This model of <b>Asymptotic Growth</b> shows how the Complexity grows according to the size of inputs.<br /><br />The <b>Big Oh</b> is a notion to describe Asymptotic Growth. To say that O grows linearly with n, we write:<br /><br /><code>O(n)</code><br /><br />This gives us an upper bound for the Asymptotic Growth of the function. To describe the Asymptotic Growth of a Function, we write:<br />F(x) is O(x<sup>2</sup>)<br /><br />This is formally read:<br />"Function F grows no faster than the quadratic polynomial x<sub>2</sub>" <br /><br />Here are some popular Orders:<br />1) O(1) - Constant regardless of input<br />2) O(log n) - Logarithmic growth<br />3) O(n) - Linear growth<br />4) O(n<sup>log n</sup>) - Loglinear Growth, which occurs surprisingly often<br />5) O(n<sup>c</sup>) - Polynomial<br />6) O(c<sup>n</sup>) - Exponential<br /><br />A Bisection Search is an example of a Binary Search. Binary Searches are Logarithmic in growth, while Linear Search are Linear in growth.<br /><br />A Recursive Factorial function can be like this:<br /><br /><code>def resursiveFactorial(n):<br /> assert n >= 0<br /> if n<=1:<br /> return n<br /> else:<br /> return n*recursiveFactorial(n-1)</code><br /><br />If we observe this statement, we can derive that the complexity is:<br /><br /><code>O(n)</code><br /><br />We can see that Recursion or Iterative doesn't change the <b>Order of Complexity</b>.<br /><br />An example of a quadratic growth function is as such:<br /><br /><code>def iterativeAdd(n):<br /> for x in range(n):<br /> for y in range(n):<br /> x+=1<br /> return x</code><br /><br />In this case, the growth is according to:<br /><br /><code>O(n<sup>2</sup>)</code><br /><br />Let us now look at a Logarithmic function:<br /><br /><code>def sumOfNumbers(x)<br /> strX = str(int)<br /> answer=0 <br /> for y in strX:<br /> answer+=int(y)<br /> return x</code><br /><br />In this case, the Order of Complexity is dependent in the number of digits in n. To represent this, we use:<br /><br /><code>O(log<sub>10</sub>n)</code><br /><br />In a Binary Search, we can observe that as the possible values for doubles, the number of search increases only by 1. We can represent this by:<br /><br /><code>O(log<sub>2</sub>len(L))</code><br /><br />Here's a good read pertaining to Orders of Complexity:<br /><a href="http://www.codeproject.com/Articles/596801/e2-80-9cIfplusallplusyouplushaveplusisplusaplusham">When all you have is a hammer, everything looks like a nail</a>Kelvinhttp://www.blogger.com/profile/08530586088329644767noreply@blogger.com0tag:blogger.com,1999:blog-6890826354654530731.post-8932973332950689972013-05-22T21:02:00.000+08:002013-06-16T02:01:05.764+08:00Computer Science 05When we first learn about numbers, we learn about <b>Base</b> 10 (or <b>Radix</b> 10). We know that the decimal system represents values in the numbers 0 to 9. When we look at <b>binary </b>numbers, we see long strings of zeroes and ones. It is in fact, similar to the decimal system, except that it's base 2. Binary numbers take more digits to represent a value. A binary number of n digits can only represent up to a maximum of 2<sup>n</sup> values. People think in base 10, while computers work in base 2. It is difficult to explain why humans work that way, but computers actually work that way because it is the easiest to build switches that have two stable states: ON (1) or OFF (0).<br /><br />For whole numbers (i.e. int), it is easy to convert from decimal to binary.<br /><br />However, if we look at fractional representation of 1/8 in binary. 0.125 in binary looks like this:<br />0.001<br /><br />Why is it the case? We first relate this to the decimal system. In the decimal system of base 10, 0.1 actually means 1*10<sup>-1</sup> and 0.01 actually means 1*10<sup>-2</sup> and so on.<br /><br />In binary, 0.1 actually means 1*2<sup>-1</sup> and 0.01 actually means 1*2<sup>-2</sup>. In fact, 0.11 in binary actually means 1*2<sup>-1</sup>+1*2<sup>-2</sup>. Therefore, 0.001 is actually 1*2<sup>-3</sup>. Decimal 0.625 in binary is actually 0.101.<br /><br />The problem comes when we try to convert decimal 0.1 to binary. To get this, we take the process:<br />1) Is 0b0.1 higher than 0.1? Yes. Set the bit to 0. We have 0.<br />2) Is 0b0.01 higher than 0.1? Yes. Set the bit to 0. We have 0.<br />3) Is 0b0.001 higher than 0.1? Yes. Set the bit to 0. We have 0.<br />4) Is 0b0.0001 higher than 0.1? No. Set the bit to 1. We have 0.0625.<br />5) Is 0b0.00011 higher than 0.1? No. Set the bit to 1. We have 0.09375.<br />6) Is 0b0.000111 higher than 0.1? Yes. Set the bit to 0. We have 0.09375.<br />7) Is 0b0.0001101 higher than 0.1? Yes. Set the bit to 0. We have 0.09375.<br />8) Is 0b0.00011001 higher than 0.1? No. Set the bit to 1. We have 0.09765625.<br />9) Is 0b0.000110011 higher than 0.1? No. Set the bit to 1. We have 0.099609375.<br /><br />As you can see, through this rigorous process, we inch closer and closer to 0.1, but we never ever reach 0.1.<br /><br />Therefore, we should never use floating point numbers to make equal comparisons.<br /><br />To get a more accurate representation of a floating point number, we can use:<br /><code>print(repr(0.1))</code><br /><br />Most of the time it is safe to assume that the floating point numbers are exactly what we think they are. However, there are times when floating point arithmetic's weaknesses show:<br /><br /><code>number=0.0<br />accurateNumber=10000.0<br />for x in range(0,1000000):<br /> number+=0.01<br />print(number)<br />print(number*10)<br />print(number==accurateNumber)</code><br /><br />In this case, as you can see, the variable number has actually accumulated quite a bit of inaccuracy problems from having gone through too many floating point operations. It is not equal to accurateNumber as we would have expected it to.<br /><br />To test for two floating numbers, we should not use the == operator. Instead, we should use:<br /><br /><code>abs(number-accurateNumber)<=epsilon</code><br /><br />You define an epsilon yourself, which can be for instance 0.00001. When you test two floating point numbers there is an extremely high probability that we would get False when we should be getting True.<br /><br />There is an urban legend on how the process of fixing programs came to be known as "debugging". It is the <a href="http://en.wikipedia.org/wiki/Software_bug">First actual case of a bug being found</a> by Admiral Grace Hopper in 1947.<br /><br />The goal of debugging is not to eliminate one bug quickly, but to move towards a bug-free program.<br /><br />Debugging is a learnt skill and nobody does it well and instinctively. A good programmer is a good debugger, and a good debugger is a person who can think systematically and efficiently about how to move toward a bug-free program. The same skills used to debug programs can be transferred to real-life situations.<br /><br />Debuggers are designed to help programmers evaluate their program. A lot of experienced programmers swear that the best debugging tool is actually print. A good debugger knows where to search for good places to place print statements.<br /><br />The question to debugging is not "<b>Why doesn't it produce the output I want it to?</b>", but rather "<b>How could it have produced the result it did?</b>". There is a scientific method to debugging, and this method is based upon studying of available data. These data include:<br />1) Source Code<br />2) Output<br /><br />A hypothesis is formed that is consistent with the data, and we design and run a repeatable experiment. Such an experiment would have the potential to refute the hypothesis, which allows us to form more accurate hypotheses.<br /><br />It is important for the experiment to be repeatable because some programs use random inputs as well as certain timing factors. We must make sure that we can repeat the experiment with the exact same input.<br /><br />When we have a bug, we need to find a smaller, simpler inputs in which the program fails. There are three reasons why:<br />1) It is faster to run<br />2) It has less variables<br />3) It is easier to debug<br /><br />There is a debugging process known as <b>Binary Search</b>. In Binary Search, we split the program into half, and we ask if the program occurred above or below. We need to now find an intermediate value that we can check on, and we print it.<br /><br />There is a concept known as a <b>Test Driver </b>or a <b>Test Harness</b>. The goal of this is to write a code that isolates certain parts of a program to find out what bugs it has. The Test Harness is usually written at a separate part of the program.Kelvinhttp://www.blogger.com/profile/08530586088329644767noreply@blogger.com1tag:blogger.com,1999:blog-6890826354654530731.post-83318079619497225222013-05-22T17:34:00.000+08:002013-06-16T02:01:21.982+08:00Computer Science 04<b>Divide and Conquer</b> is the idea of taking a difficult problem and breaking them into simpler pieces. These smaller problems must have two properties:<br />1) The smaller problems are easier to solve than the original ones.<br />2) The solutions to these problems can <b>easily</b> can be combined to solve this big problem.<br /><br />The technique of <b>Recursion</b> allows Divide and Conquer to be applied to problems. In Computer Science, it is:<br />1) A way of describing / defining problems<br />2) A way of designing solutions<br /><br />A Recursive problem has typically two parts:<br />1) Base Case - It describes a direct answer.<br />2) Recursive (Inductive) Case - Reduce the problem to a simpler version of the same problem through some simpler operations.<br /><br />Here are some problems:<br />1) Calculate b<sup>n</sup> resursively<br />-We first start by defining that b<super>n</super> is b*b*b*b*b...*b n times.<br />-We can then reduce the problem to b*b<sup>n-1</sup><br /><br />We can now say:<br />b<sup>n</sup> = b.b<sup>n-1</sup> if n!=0<br />b<sup>n</sup> = 1 if n==0<br /><br />We can create a recursive method like this:<br /><br /><code>def recursivePow(number,exponent)<br /> if exponent==0<br /> return 1<br /> else<br /> return number*recursivePow(number,exponent-1)</code> <br /><br />We implemented the above statements into code form. Of course, this method only works with positive values of n.<br /><br />We can also use it to look at the Towers of Hanoi problem.<br />There is a Temple in Hanoi, and in it there are 3 tall towers. There are 64 golden discs of descending sizes. The rules are like this:<br />1) You can only move one disc at a time<br />2) A larger disc cannot cover up a smaller disc<br /><br />The recursive solution is this:<br />1) Suppose that you have three stacks. A starting stack, an ending stack, and a spare stack.<br />2) The solution is to move a stack of n-1 from the starting stack to the spare stack. Then move the last piece to the ending stack. Then move everything from the spare stack to the ending stack.<br />3) If the stack is of size 1, just move it to the ending stack. <br /><br />In code, we have:<br /><br /><code>def hanoi(n, stackFrom, stackTo, stackSpare):<br /> if n==1:<br /> print("Move from",stackFrom,"to",stackTo)<br /> else:<br /> hanoi(n-1,stackFrom,stackSpare,stackTo)<br /> hanoi(1,stackFrom,stackTo,stackSpare)<br /> hanoi(n-1,stackSpare,stackTo,stackFrom)<br /><br />hanoi(5, "Starting", "Ending", "Spare")</code><br /><br />You can test this using:<br /><a href="http://www.novelgames.com/flashgames/game.php?id=31">Tower of Hanoi</a><br /><br />Next, here's another problem. Suppose that we are looking for a palindrome. A palindrome is a sentence that reads the same from both sides: "able was I ere I saw elba". To approach a palindrome, here's how we do it:<br />1) Check if the first and last characters are the same<br />2) If they are, remove one character and check again till there are no more characters left.<br />3) If it satisfies all the way, then it's a palindrome.<br /><br />The Basecases are:<br />1) If string is length 1, it is a palindrome<br />2) If string is length 0, it is a palindrome<br /><br />The Recursive case is: <br />1) If first and last character is the same, continue checking<br /><br />Implemented in code, it would be:<br /><br /><code>def formatSentence(sentence):<br /> sentence=sentence.lower()<br /> ans=''<br /> for c in sentence:<br /> if c in {'a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z'}:<br /> ans+=c<br /> return ans<br /><br />def recursivePalindrome(sentence):<br /> if len(sentence)<=1:<br /> return True<br /> elif sentence[0]==sentence[len(sentence)-1]:<br /> return recursivePalindrome(sentence[1:len(sentence)-1])<br /> else:<br /> return False<br />print(recursivePalindrome(formatSentence("Able was I ere I saw Elba")))<br />print(recursivePalindrome(formatSentence("Are we not drawn onward, we few, drawn onward to new era")))<\code></code><br /><br />As we can see, we can solve problems by breaking it down into simpler problems.<br /><br />Here's another problem we can look at:<br />1) Suppose that a pair of rabbits can mate within 1 month<br />2) A pair of rabbits will be born after gestation period of 1 month<br />3) The pair is a male and a female<br />4) How many female rabbits will there be at the end of a year?<br /><br />We can look at the problem in a timeline:<br /><table border=""><tbody><tr><th>Month</th><th>Female</th></tr><tr><td>0</td><td>1</td></tr><tr><td>1</td><td>1</td></tr><tr><td>2</td><td>2</td></tr><tr><td>3</td><td>3</td></tr><tr><td>4</td><td>5</td></tr><tr><td>5</td><td>8</td></tr><tr><td>6</td><td>13</td></tr><tr><td>7</td><td>21 </td></tr></tbody></table><br />We can now derive the function:<br />1) f(n)=f(n-1)+f(n-2)<br /><br />Here are the Base cases:<br />1) If month is 0, return 1<br />2) If month is 1, return 1<br /><br />Here are the Recursive cases:<br />1) Otherwise, return f(n-1)+f(n-2)<br /><br />We can now create the code for generating the Fibonacci number:<br /><br /><code>def fibonacci(n):<br /> if n<=1:<br /> return 1<br /> else:<br /> return fibonacci(n-1)+fibonacci(n-2)<br /><br />print(fibonacci(30))</code><br /><br />Fibonacci numbers appear in nature all the time. The number of petal on most flowers is almost always a Fibonacci number.<br /><br />Here is a hair-raising read:<br /><a href="http://library.thinkquest.org/27890/applications5.html">Fibonacci in Nature</a> <br /><br />You can actually find the golden ratio using:<br />lim(x->inf) fibonacci(n):fibonacci(n-1)Kelvinhttp://www.blogger.com/profile/08530586088329644767noreply@blogger.com0tag:blogger.com,1999:blog-6890826354654530731.post-74467155341368219282013-05-22T15:35:00.000+08:002013-06-16T01:44:21.257+08:00Computer Science 03Recall that previously we made programs did approximation. We could print the approximations during the process onto the screen. However, these approximations are not recorded and you cannot really do anything with them.<br /><br />We have three data structures to collect items in Python:<br />1) <b>Tuples</b><br />2) <b>Lists</b><br />3) <b>Dictionaries</b><br /><br />Tuples and Lists are <b>Ordered Sequences of Objects</b>. You can refer to the objects by using an index.<br />Dictionaries on the other hand, are NOT ordered.<br /><br />You can declare a 'tuple' as such:<br /><br /><code>helloTuple = (1, 2, 3, 4, 5)</code><br /><br />We can then print the 'tuple' as such<br /><br /><code>print(hello[1]) #Prints the second object in the tuple<br />print(hello) #Prints the contents of the tuple<br />print(len(hello)) #Prints the length of the tuple<br />print(hello[len(hello)-1]) #Prints the last element in the tuple</code><br /><br />We can use tuples in a variety of ways. For example, to find numbers divisible by 3 from the range of 0 to 100, we can use:<br /><br /><code>divisor=5<br />divisible=() #This declares an empty tuple<br />for x in range(0,101):<br /> if x%divisor==0:<br /> divisible+=(x,)<br />print(divisible)</code><br /><br />This would check if the number is divisible by divisor using the modulo (%) function, and if it is, it will append a tuple of length 1 to the back of the divisible tuple.<br /><br />Tuples are <b>Immutable</b>. The values of tuples cannot be changed. For example, if we cannot use divisible[0]=1 at the end of our previous example.<br /><br />Lists are <b>Mutable</b>. The value of the list object can be changed.<br /><br />You can create lists using:<br /><br /><code>listNames = ['Kelvin', 'Ling']<br />listNames2 = ['Wynne', 'Elaine']</code><br /><br />The difference is that we're using a square bracket instead of a normal one.<br /><br />We can append lists as such:<br /><br /><code>listNames.append(listNames2)</code><br /><br />Append is a method that actually <b>Mutates</b> the list. In this list, we have various items:<br />1) Object 0 is a 'str'<br />2) Object 1 is a 'str'<br />3) Object 2 is a 'list'<br /><br />In this case, list2[0] and list1[2][0] is actually referring to the same object in memory. Modifying either will cause both to change since they're referring to the same thing. Append adds a <b>Pointer</b> to the list instead of simply copying.<br /><br />The difference is that in the first example, if we are actually concatenating two tuples, a new tuple is created and it is assigned to the variable. We can treat tuples as Primitive Data-Types in Java, and tuples as Objects in Java.<br /><br />If you concatenate a list into a tuple, a new tuple is created, with the list at the back. Suppose that tuple1[2] contains the Pointer to list1, tuple1[2][0] contains the same Pointer as list1[0], changing tuple1[2][0] actually changes list1[0]. The tuple1 has not changed, because it is Non-Mutable. The changes are applied to the list1 instead.<br /><br />Assignment is changing the object to which an identifier binds to.<br />Mutation is changing the actual value of the object.<br /><br />Lists and tuples can be sliced the same way as a 'str'.<br /><br />You can loop through a tuple or list as such:<br /><br /><code>for x in list1<br /> print(x)</code><br /><br />You can also use the <b>remove()</b> method to remove the first occurrence of a specified object:<br /><br /><code>allNumbers=[1,2,3,4,5,6,7,8,9]<br />unwantedNumbers=[1,3,7,8,9]<br />for x in unwantedNumbers:<br /> if x in allNumbers:<br /> allNumbers.remove(x)<br />print(allNumbers)</code><br /><br />The code goes through every unwanted number and if it exists in the list of all numbers, it would remove the first occurrence of it. To remove all occurence, we use the loop "while x in unwantedNumbers" instead.<br /><br />You can sort a list like this:<br /><br /><code>allNumbers.sort()</code><br /><br />An important concept to note is highlighted here:<br /><br /><code>list1=[1]<br />list2=[list1,list1]<br />print(list2) #We should get [[1], [1]]<br />list1[0]='hello'<br />print(list2) #We should get [['hello], ['hello']]<br />list2[0]='a'<br />print(list2) #We should get ['a', ['hello']]<br />list1=['fish']<br />print(list2) #We should still get the same, which is ['a', ['hello']]</code><br /><br />From the last print, we can see that the original list object pointed by list1 has not changed. Instead, The identifier list1 has actually been assigned to a different list.<br /><br />When one Object have multiple names, we call it an <b>Alias</b>. We can get problems when dealing with Aliases for Mutable Objects. Take for example, the method for a list to another:<br /><br /><code>def copyList(listSource, listDestination):<br /> for x in listSource:<br /> listDestination.append(x)<br /> print(listDestination)</code><br /><br />Copying list1 to list2 works, but if we copy list1 to list1, then we will have items adding to the source (when we add it to the destination), and we will never reach the end of the list.<br /><br />You can slice, concatenate, index etc. with Lists and Tuples, but there is a built-in type, known as a Dictionary, which has the following propeties:<br /><br />1) There is no order of things in a Dictionary<br />2) The indexes are called Keys and it can be of any Immutable type.<br /><br />To create a dict:<br /><br /><code>dictionary = {'toilet':'paper', 'pencil':'eraser',3:'computer'}<\code></code><br /><br />It is in the format key:object. A dict is therefore a set of key:value pairs.<br /><br />To select 'paper', we can use:<br /><br /><code>print(dictionary['toilet'])</code><br /><br />If we want to view all the keys, we can use:<br /><br /><code>print(dictionary.keys())</code><br /><br />We can delete items in the dictionary as such:<br /><br /><code>del(dictionary['pencil'])</code><br /><br />We can use the keys returned by the dictionary to look up the dictionary as well:<br /><br /><code>for key in dictionary.keys():<br /> print(key,"=",dictionary[key])</code><br /><br />In order to clone a Mutable type, we need to use the syntax:<br /><br /><code>item1=item2[:]</code>Kelvinhttp://www.blogger.com/profile/08530586088329644767noreply@blogger.com0tag:blogger.com,1999:blog-6890826354654530731.post-40557165578146304492013-05-22T00:16:00.000+08:002013-06-16T02:01:56.472+08:00Computer Science 02We will now look at <b>Python</b>, as we know previously as an <b>Interpreted Programming Language</b>.<br /><br />A popular <b>IDE</b> (Integrated Development Environment) for Python is actually <b>IDLE</b>. IDLE consists of:<br />1) <b>Text Editor</b> - A place where the program can be typed and then sent to the Shell.<br />2) <b>Shell</b> - The actual environment that the Python code is interpreted.<br />3) <b>Debugger</b> - Useful for debugging, but most people use print statements.<br /><br />Everything in Python is considered an <b>Object</b>. The Python code itself can be an Object. Each object has a <b>Type</b>. The Type determines what kind of object it is and what can we do with it.<br /><br />Types can be two kinds:<br />1) <b>Scalar</b> - These are the primitive types of the programming language. They are indivisible types.<br />2) <b>Non-scalar</b> - Things that are divisible, for example, String, which can be divided into smaller 'str' of many chars.<br /><br />The most common Scalar Types we work with are:<br />1)<b> int </b>- A real number.<br />2)<b> float</b> - An approximation of a real number.<br />3)<b> bool</b> - A true or false value.<br />4)<b> NoneType</b> - Representing a null.<br /><br />An example of a Non-Scalar type is:<br /><br /><b>str</b> - A string of characters.<br /><br />In Python, we can type the literals (e.g. 3, 'a', '5.6) into the Shell and it would return that value. To check the type of a literal, we can use the type() function as such:<br /><code>type(3)</code><br /><br />This would tell us that 3 is an 'int' type.<br /><br />Literals by themselves are useless. We cannot do much with it. What we can do, however, is to chain them together using <b>Operands</b> (values) and <b>Operators</b> to form <b>Expressions</b>.<br /><br />In Python, when we deal with division, it is important to use floating point numbers. If we use "+" between two integers, 5+5, the "+" sign will act as an addition. If we use "+" between 'a' and 'b', we would get 'ab', because the "+" would act as concatenation when dealing with 'str'. This is because "+", like many other Operators, is <b>Overloaded</b> (dependent upon the types and parameters given to it).<br /><br />We can do manual conversion as such, which is called <b>Type Casting</b>:<br />1) 'a'+str(3) gives you 'a3'<br />2) int(2.1) would truncate the float into an integer value of 2.<br /><br />A <b>Program</b> in the Python sense is a synonym to a <b>Script</b>.<br /><br />When typing a literal into the Shell, it prints the value directly to the screen. When things are executed in a Program, however, it will not be printed unless we use the <b>print()</b> command.<br /><code>print 5</code><br /><br />In order for things to work, it is important for us to have <b>Variables</b>. A Variable in Python is simply a name for an Object. To bind a Variable to an Object, we use the <b>Assignment</b>:<br /><code>x=5</code><br /><br />This assigns ("=") the int Object of value 5 to the Variable x.<br /><br />The Function <b>print()</b> allows us to output things. We'll need to get inputs, and there are two types of input:<br />1) <b>Raw Input</b> (this is the only type that exists in Python 3.0)<br />2) <b>Input</b><br /><br />Raw Input always interprets what the user types as a 'str' type. We would need to do Type Casting:<br /><br /><code>x=int(input("Please enter an integer: "))<br />print("You entered "+str(x)+".")</code><br /><br />The simplest programs are <b>Straight Line Programs</b>. Straight Line Programs are Programs where every line is computed exactly once, with no conditions or loops (which are referred to as "tests").<br /><br /><b>Conditional Statements</b> are the simplest "tests". A conditional statement comprises of:<br />1) <b>if</b><br />2) <b>else</b><br />3) <b>elif</b><br /><br />A program where Conditional Statements are implemented is called a <b>Branching Program</b>. In a Branching Program, each statement is executed the most once.<br /><br />In conditional statements, the indentation is extremely important. This is how a simple conditional statement is implemented:<br /><br /><code>x=int(input("Enter an integer: "))<br />if x%2==0:<br /> print("Even")<br />else:<br /> print("Odd")</code><br /><br />Programs are intended to be readable, and by enforcing this, Python actually ensures some sort of code home-keeping.<br /><br />Finally, we look at a <b>Looping Construct</b> which allows us to do <b>Iterations</b>. When we do this, we'll be able to create a <b>Turing Complete Program</b>.<br /><br />Whenever we write a loop, we should have either an <b>Incrementing</b> or a <b>Decrementing Function</b>. This allows the loop a guarantee to exit.<br /><br />The properties of a Decrementing Function is as such:<br />1) It will map a set of program variables to an 'int'.<br />2) It starts with a non-negative number.<br />3) The loop terminates when the value of the variable is <= 0.<br />4) It is decreased each time through the loop.<br /><br />In a loop as seen here, there is in fact a Decrementing Function:<br /><br /><code>while a<25:<br /> a+=1</code><br /><br />Even though all we can see is an Increment here, we can derive the Decrementing Function of:<br />25-a<br /><br />Loops allow us to do something known as <b>Exhaustive Enumeration</b>, which we usually refer to as <b>Bruteforce</b>.<br /><br />The <b>for</b> loop in Python is as such:<br /><br /><code>for x in range(0,25):<br /> instructions</code><br /><br />In this case, range(0,25) actually creates a <b>tuple</b> with a sequence starting from 0 to 24.<br /><br />Approximation in computing is finding a number that is accurate enough. It does not need to be perfect, but it must be close enough for our use. For example:<br /><br /><code>y^2=x+-epsilon</code><br /><br />Where epsilon is the degree of error acceptable.<br /><br />We would then be able to come up with a slow converging approximation algorithm as such:<br /><br /><code>x = 123<br />epsilon = 0.01<br />ans = 0<br />while abs(ans**2-x)>=epsilon and ans<=x:<br /> ans+=0.00001<br />print(ans," is close to sqrt of ",x)</code><br /><br />The program is extremely dependent on size of the number given to test, and the maximum number of times it can loop can be calculated by x/0.00001.<br /><br />A more effective algorithm exists to solve this problem. It is called the <b>Bisection Search</b>. It can be implemented by <b>cutting the search space in half after each iteration</b>. The worst case would be log(x) where x is the number of possible value of answers.<br /><br />In this method, we first define the possible range of answers. We then try to guess the answer from the middle of the range. We then check if the answer when put through the formula gives us an answer near enough to what we want. If it's higher, then we search for a lower number. If it's lower, we search for a higher number. It is implemented as such:<br /><br /><code>x=5;<br />epsilon=0.00000001;<br />upperBound=x;<br />lowerBound=0.0;<br />ans=(lowerBound+upperBound)/2.0;<br />while abs(ans**2-x)>epsilon:<br /> if ans**2 > x:<br /> upperBound=ans;<br /> else:<br /> lowerBound=ans;<br /> if ans==(lowerBound+upperBound)/2.0:<br /> break<br /> else:<br /> ans=(lowerBound+upperBound)/2.0<br /> print(ans)<br />if abs(ans**2-x)>epsilon:<br /> print("Converged at",ans,"without meeting epsilon specifications of",epsilon)<br /> print("Resultant error is",ans**3-x)<br />else:<br /> print("Converged at",ans)</code><br /><br />This Program does smart guesses at a range of values you give it. It terminates when it has converged to an answer or it's failed. However, this assumes that the answer lies between 0 to x, which means that for problems where the answer is outside this range, this could not solve it. That is a flaw of this method.<br /><br />An example to observe this problem is when we try to search for numbers less than 1.0. For example, 0.5's square root is 0.707107, and if we look at the search space of 0 to 0.5, we'll never find it! We then observe this pattern that for every number that is less than 1, the highest answer is 1.0. Therefore, we can see that for the upper bound, we cannot have it less than 1. We can change our upperBound statement as such:<br /><br /><code>upperBound=max(x,1.0)</code><br /><br />This could take the value of x, or 1.0, whichever is higher.<br /><br />We typically want programs to do things with minimal coding. A smaller program is more efficient than a larger program who can do the same thing.<br /><br />We can use <b>Functions</b> to help shorten code. A Function is something that provides:<br />1) <b>Decomposition</b> - It allows us to break our program up into modules. An example of a module is a <b>Function</b> and a <b>Class</b>. Modules should be self-contained and reusable.<br />2) <b>Abstraction</b> - Suppresses details that the programmer does not need to see. This allows the programmer to focus on what it does.<br /><br />A simple function could be as such:<br /><br /><code>def accurateEnough(x, y, epsilon):<br /> return abs(x-y)<=epsilon</code><br /><br />In order to use this Function, we invoke it as such:<br /><br /><code>print(accurateEnough(1,2,1))</code><br /><br />Note that you must invoke this function below where it's been defined or it would not work.<br /><br />A Function can contain:<br />1) <b>Formal Parameters</b> - are the names x, y and epsilon. You can use x in the main code and it would have nothing to do with the parameters used in the function, because it is self-contained. This is because upon entry of a function, a new <b>Scope</b> is created. A Scope is a mapping of names to objects. The parameter passed into the function during its call is called the <b>Actual Parameter</b>.<br />2) <b>Return</b> - It is a statement that causes the Function to return a value to wherever it is called. If no Return is specified, then the NoneType is returned.<br />3) <b>Assert</b> - The assert statement takes a bool value (resulting from a bool, expression or function) and stops the function if it evaluates to False, otherwise it continues.<br /><br />We can create a more robust function with assert as such:<br /><br /><code>def accurateEnough(x, y, epsilon):<br /> assert epsilon>=0<br /> return abs(x-y)<=epsilon</code><br /><br />This ensures that epsilon is a non-negative value before it continues. <br /><br />What happens when we run a program:<br />1) The Interpreter will build a Scope, and it would find Object it sees in sequence and assigns the value of it to the Name (the Object can be a Function or a Variable, etc.).<br />2) If a Function is invoked, then the Interpreter would create a new Scope for it and begin executing it as in Step 1.<br />3) This will continue till the program reaches the end.<br /><br />Each Scope is also called a <b>Stack Frame</b>. When we start the Program, we begin with the <b>Main Scope</b>. Then as we progress into Functions, we'll have Scopes adding to the top of the <b>Stack</b>. When we're done with the Function, its Scope gets <b>popped</b> from the top, and we return to the Scope of the previous Function or the Main Scope. We would eventually end up again with the Main Scope. This is because a Stack is <b>FILO </b>(Frst-In-Last-Out). We can use Variables defined in a Scope below its own Scope (e.g. a Function uses a Variable in the Main Scope). When a Function exits, its Variables and everything else in its scope are lost.<br /><br />We now look at the 'str' type, which is the most common Non-Scalar type. A 'str' can be used as such:<br /><br /><code>sum=0<br />for x in str(1234):<br /> sum+=int(c)<br />print(sum)</code><br /><br />What we get here is that the numbers 1234 is first converted into a 'str' of '1234'. For each item in the String, starting from 1, they are added to the total sum. In the end, we'll have the answer of 10.<br /><br />We can deal with 'str' as if it's an array. Suppose that we have the following 'str':<br /><br /><code>helloStr = 'hello'</code><br /><br />We can access the first character of the 'str' using:<br /><br /><code>print(helloStr[0])</code><br /><br />The above code would give us a 'h'. We can also access a range of characters, as such:<br /><br /><code>print(helloStr[0:2])</code><br /><br />This would print out the first (0) and second (1) char in the 'str'. This is a method known as a <b>Slicing</b>. It makes a new copy of the original 'str' (known as a <b>Substring</b>). There are also different operations that you can do with 'str' types, for example, to find patterns (and get the location of the provided pattern), we can use:<br /><br /><code>helloStr.find('lo')</code><br /><br />We would get the output 3.<br /><br />We'll now look at the following piece of code, which enables us to find patterns in a provided text:<br /><br /><code>#Finds patterns in Strings<br />providedString = input('Enter the text: ')<br />findString = input('Enter the pattern: ')<br />low = providedString.find(findString)<br />while providedString.find(findString,low)!=-1:<br /> print(providedString.find(findString,low),"to",low+len(findString))<br /> print(providedString[0:low]+'['+providedString[low:low+len(findString)]+']'+providedString[low+len(findString):])<br /> low=providedString.find(findString,low+1)</code><br /><br />In this Program, we first get a String, and a Substring to search for. We have a 'low' value to indicate the last position from the last search. We find the first occurrence of the searched text, and print its start and end slice index of the providedString. Next, we print out the text, adding brackets between the found text. This repeats till no more occurrence is found.Kelvinhttp://www.blogger.com/profile/08530586088329644767noreply@blogger.com0tag:blogger.com,1999:blog-6890826354654530731.post-37864267161927998452013-05-20T20:45:00.000+08:002013-06-16T02:04:00.365+08:00Computer Science 01Engineers are people who can map problems into computational frameworks.<br /><br />A good computer scientist is someone who has the skills to make a computer do whatever they want it to do.<br /><br />Let's start with the question... What is computation? To answer this, we first look at two kinds of knowledge:<br /><br />1) <b>Declarative</b> - It is composed of statements of facts. It does not tell us how to arrive at a conclusion, but if a piece of declarative knowledge is correct, we can use it to test computations. An example of declarative knowledge is: "A chocolate cake is delicious".<br /><br />2) <b>Imperative</b> - Imperative knowledge tells how to solve something. For example, a recipe for a chocolate cake is imperative knowledge.<br /><br />We use the recipe to create a dish (Imperative Knowledge), and if the dish is indeed delicious, looks everything like the description of a chocolate cake (Declarative Knowledge) then we know we've made a chocolate cake.<br /><br />Let's look at an approximation algorithm that lets us solve a square root. Say for example, we are looking for the value of sqrt(25). <br /><br />We first get facts about the declarative knowledge we know of squares and their roots. It is as such:<br /><br /><code>g^2=x</code><br /><br />We have x, we need to find g.<br /><br />The approximation algorithm, a form of imperative knowledge, for obtaining the sqrt(x) is as such:<br />1) Guess a value g.<br />2) Compute the value of g^2.<br />3) If g^2 is x, or close enough to x, then g is the root of x.<br />4) Else, the next value of g will be (g+x/g)/2 (The average of g and x/g). Compute again from Step 2.<br /><br />Programmatically, we write it like this:<br /><br /><code>while(math.pow(g,2)!=x)<br />{<br /> g=g+x/g;<br />}</code><br /><br />When you get the value of g which when squared gives you x, it is said that the algorithm has <b>converged</b>.<br /><br />An algorithm is made up of the following components:<br /><b>Instructions</b> - Operations that are carried out, e.g. "g=g+x/g"<br /><b>Flow Control</b> - The way and sequence the instructions are carried out.<br /><b>Termination Condition</b> - When are we satisfied with the answer, e.g. "g^2 equals x or is close enough to x"<br /><br />There are two types of programs:<br /><br />1) <b>Fixed Program Computer</b> - These computers are designed to do specific and fixed things. Instructions are hard-coded into circuitry (think logic gates in ASICs) and only input and output is considered data.<br /><br />2) <b>Stored Program Computer</b> -The input, instructions and output are all considered data and are all stored alongside in memory. This allows infinite amount of flexiblity and programs can now produce programs. This is the current state of computers.<br /><br />In the Stored Program Computer, people think of a computer as a Hardware Program, known as an <b>Interpreter</b>.<br /><br />A basic Stored Program Computer has the following things:<br />1) <b>Memory</b> - Memory used for storing of data and program.<br />2) <b>Control Unit</b> - Interprets instructions from memory and turns it into a form that the ALU can understand.<br />3) <b>Arithmetic Logic Unit</b> - Accepts instructions from Control Unit, pulls and pushes data from Memory, computes and stores results in its Accumulator (the ALU's own Memory).<br />4) <b>Input/Output</b> - Allows interaction with external systems.<br /><br />A British mathematician proved that there are 6 primitive instructions that can be used to do anything that can be done with the computer.<br /><br />In Programming, we make use of a set of Primitive Instructions and a set of Flow Control. Using and combining these two elements, we can come up with different programs.<br /><br />A Programming Language comprises of:<br />1) <b>Syntax</b> - Which sequence of characters and symbols constitute a well-formed string, e.g. g=5+6.<br />2) <b>Static Semantics</b> - Which well-formed strings have a meaning, e.g. 6+4 by itself is well-formed but does not have a proper meaning/purpose.<br />3) <b>Semantics</b> - What that meaning is. A program can have a proper syntax and proper static semantics, but it may mean differently from what the programmer wants it to do.<br /><br />A program with improper semantics can:<br />1) <b>Crash</b> - The program stops running and shows an obvious sign that it's happened. A properly designed program environment will keep damages minimal and local (i.e. does not damage the program or system).<br />2) <b>Never Stop</b> - The program never finishes the job, which is also known as "Hang". These programs typically have entered an infinite loop due to a deadlock or semantics error.<br />3) <b>Wrong Output</b> - The program runs to completion, but produces the wrong answer. This is the worst kind of problem because people may think it's running fine and it isn't.<br /><br />There are two types of Programming Languages:<br />1) <b>Interpreted</b> - Python, where the source code is executed directly and does not need to be compiled.<br />2) <b>Compiled</b> - Java, where a compiler changes source code into an object code, which is executed by the Hardware Interpreter.Kelvinhttp://www.blogger.com/profile/08530586088329644767noreply@blogger.com1tag:blogger.com,1999:blog-6890826354654530731.post-70727311644745212602013-05-05T17:12:00.000+08:002013-05-05T17:12:41.343+08:00CCNA Review 04You can see a plethora of different hardware in a typical network, but the most important two devices are the Switches and the Routers.<br /><br />Switches were not like that from the start. In the past, everything was connected by network hubs. A hub is basically a repeater with multiple ports. Being a repeater, every device on the network actually shares the bandwidth available through it. Therefore, only one device can transmit at the same time. If more than one device transmits, then it is said that there is a collision.<br /><br />To combat collisions, the CSMA/CD (Carrier Sense Multiple Access / Collision Detection) protocol is created which detects collisions whenever one happens. When two devices transmits at the same time, a collision is resulted and both devices would transmit a jam signal. The devices would back off for a random amount of time before transmitting again.<br /><br />There are three types of RX/TX modes:<br />-Simplex is where the transmitter only transmits, and the receiver only receives. It is a one-way traffic. An example of simplex communication is radio broadcasting.<br />-Half-Duplex is when only one device can transmit across the wire at the same time. When one talks, all others sharing the medium must listen. An example of half-duplex communication is WiFi.<br />-Full-Duplex allows all devices to transmit at the same time. Modern switched networks and router interfaces are full-duplex.<br /><br />The biggest problem with the hub is that it is a shared medium, and therefore it is half-duplex. If 5 devices are transmitting at the same time across a 50Mbps hub, then each of them have less than 10Mbps for transmission. In this case, an entire hub is a collision domain. The repeater and hub is a layer 1 device.<br /><br />A new device came and allowed collision domains to be segmented. This is called the bridge. The bridge is an intelligent layer 2 device which allows learning of MAC addresses on each interface, much like a switch. Hubs can be connected to the bridges, allowing larger networks to be created. Each interface on the bridge is a collision domain if a hub is connected to it. If a computer is connected directly to a bridge, it has full duplex connectivity. However, the biggest problem with bridges is that they're software based, so they introduce really high latencies and limited bandwidth.<br /><br />Switches were then created, which is much like a bridge, but moves frames around via ASICs. Every port on the switch is a full-duplex wire. If all ports are full-duplex, there can be no collision occuring on the switch. However, each switchport is still considered a collision domain by definition. Modern switches can support multiple speeds per interface and it is managed and intelligent. It is managed because you can modify settings, and it has a large feature-set and intelligence due to its IOS firmware.<br /><br />Switches typically have one or two high-speed links for daisy chaining switches together. Switches need to be connected with crossover cables but modern switches have auto-sensing ports that allow connection through straight-through as well. Modern switches also have SFP (Small-Form Factor Pluggable Transceiver) modules which allow chaining together via Fibre Optics.<br /><br />When switches are first booted up, its CAM (Content Addressable Memory) table is empty. CAM tables are used to store MAC address associations with interfaces. Once STP (Spanning Tree Protocol, covered in a later article) is completed and stable, the switches will start learning MAC addresses from its interfaces.<br /><br />Suppose that two devices, A and B, connected to the switch wants to communicate. A would first create an ARP request to look for B. At this time, A's MAC will be recorded, while the ARP request is forwarded out of all ports. B would receive the ARP request and generate an ARP reply, which allows the switch to record the location of B's MAC. Now that the switch has both device's MACs, further communication from now would be switched at wire-speed.<br /><br />By default, CAM entries has a timeout of 5 minutes. If a device stops communicating for 5 minutes, the entry is dropped. If A stops communicating for 5 minutes and his MAC gets dropped, further communication to A will be forwarded out of all open ports like a broadcast until his MAC is learned again.Kelvinhttp://www.blogger.com/profile/08530586088329644767noreply@blogger.com11tag:blogger.com,1999:blog-6890826354654530731.post-56453997692848059662013-05-05T16:06:00.001+08:002013-05-05T16:06:16.667+08:00CCNA Review 03In this article we are going to talk about two of the most common transport layer protocols: UDP and TCP.<br /><br />If the two protocols could talk, UDP would say, "I hope it gets there", while TCP would say, "I'll make sure it gets there".<br /><br /><img src="http://img203.imageshack.us/img203/1493/tcpudp.jpg" /><br /><br />UDP (User Datagram Protocol) is a connectionless protocol. It does not establish connections and packets sent by it has no guarantee of reaching the intended target. Therefore, UDP is unreliable. However, it has many practical uses due to its low overhead and is typically used in time-sensitive, real-time applications like VoIP where a dropped packet is not important but latency and jitter could render the whole protocol useless.<br /><br /><img src="http://img507.imageshack.us/img507/4679/mjbudpheader800x264.png" /><br /><br />An example of a UDP protocol is DNS Client. The DNS Client is UDP not because of the time-sensitive nature but because everything fits into one packet and it makes no sense to have to establish a full connection with the 3-Way Handshake and the subsequent ACKs and termination just to send that one packet. If the request is dropped the client simply has to send another one after a timeout. Much more efficient than using TCP. <br /><br />TCP (Transmission Control Protocol) is a connection-oriented protocol. Before the start of every transmission, there is a Three-Way Handshake that starts a session between the clients. The handshake consists of a SYN, SYN-ACK and ACK, which will be covered more in detail later. TCP requires ACK for messages, which allows dropped packets to be detected and resent. TCP is therefore reliable. However, due to the connection-oriented nature, it has costly overhead and is more suitable for applications that deal with large transfers that has a requirement for data integrity. TCP has a mechanism known as the TCP Sliding Window Protocol, which allows more efficient use of the acknowledgment system. This will be covered in greater detail later.<br /><br /><img src="http://img209.imageshack.us/img209/3859/mjbtcpheader800x564.png" /><br /><br />An example of TCP is the HTTP, where a 3-Way Handshake is first established before the client sends a GET message. Since web-browsing is data-integrity sensitive and bandwidth-heavy, TCP is the perfect protocol. Protocols like FTP use TCP for the same reason.<br /><br />As previously mentioned, the Three-Way Handshake consists of a SYN, SYN-ACK, and an ACK. It is used to initialize a connection between two communicating hosts. The initiating host (in the case of a client accessing a web server) would first send a SYN 0 to the target. This is a good time to introduce the concept of sequence numbers, which TCP uses to track what's been received and what's not been received. The server would then reply with a ACK 1, stating that it has received 0 and is ready for 1. It would also send a SYN 0 along with it. Finally, the client sends a ACK 1. <br /><br /><img src="http://img850.imageshack.us/img850/9463/e006581846f26a42c10d4.jpg" /><br /><br />At this point there would be a connection ready for data to be transmitted. The first HTTP packet would then be sent out.<br /><br />The sequence number represents the amount of bytes sent out by the sender. The ACK number represents the sequence that it is ready to receive. This is particularly useful in implementing the TCP Sliding Window Protocol. To implement the Sliding Window, TCP first sends out one segment. Once it is successful, it would send more segment at one go (e.g. 2). If it is successful, it would send some more (e.g. 4), and so on. It would do so until it reaches the tolerance of the end-to-end devices and packets get dropped. It knows that packets get dropped when the ACK it receives is not what is expects. The next transmission would then start from that number of packets, and then from there it slowly increases.<br /><br /><img src="http://img248.imageshack.us/img248/5499/tcpswexampleserver.png" /> <br /><br />The common TCP and UDP ports are:<br /><br />TCP<br />21 - FTP (File Transfer Protocol)<br />22 - SSH (Secure SHell)<br />23 - Telnet<br />25 - SMTP (Simple Mail Transfer Protocol)<br />53 - DNS Server (DNS server-server communication)<br />80 - HTTP (Hyper-text Transfer Protocol)<br />110 - POP3 (Post Office Protocol 3)<br />443 - HTTPS (Hyper-text Transfer Protocol Secure)<br /><br />UDP<br />53 - DNS Client (DNS client-server communication)<br />69 - TFTP (Trivial File Transfer Protocol)<br /><br />Both protocols have 65535 distinct ports.<br />Well known ports are from 0 to 1023.<br />1024 to 49151 are registered ports.<br />Finally, 49152 to 65535 (2<sup>15</sup>+2<sup>14</sup> to 2<sup>16</sup>-1) are the dynamic/private ports.<br /><br />Let us look at the previous examples we used again.<br /><br /><img src="http://img259.imageshack.us/img259/5003/networksimple.png" /> <br /><br />If A wants to visit a website on B, then this is going to happen.<br /><br />Again, A would check if B is in the same network. It would realize that B is in a different network, so it needs to send it through its default gateway. It ARPs for the router's MAC and once it gets it, the following frame is created:<br /><br />(v,w)<br />FCS (Created at the end)<br />SYN<br />S TCP - 58372 (Randomly generated)<br />D TCP - 80<br />S IP - A's IP<br />D IP - B's IP<br />S MAC - A's MAC<br />D MAC - R1's MAC<br /><br />As the switch receives the frame, it would associate A's MAC to the interface, then forward it out of all ports because it does not know where the router is. When the router receives the frame, it would look deeper at the IP header and realize that it's intended for another network that it can reach by sending it to R2. It would replace the S and D MAC, recalculate the FCS, then send it out as:<br /><br />(x)<br />FCS (Created at the end)<br />SYN<br />S TCP - 58372<br />D TCP - 80<br />S IP - A's IP<br />D IP - B's IP<br />S MAC - R1's MAC<br />D MAC - R2's MAC<br /><br />Once F2 gets it, it would realize that it is intended for something connected to S2, so it sends it out of that interface like this:<br /><br />(y,z)<br />FCS (Created at the end)<br />SYN<br />S TCP - 5837 <br />D TCP - 80<br />S IP - A's IP<br />D IP - B's IP<br />S MAC - R2's MAC<br />D MAC - B's MAC<br /><br />The switch learns that R2 is at that interface, then sends it out through all the other ports. B receives it, processes the IP header and realizes that it's for itself. Once it reaches the TCP header, it would realize that something is trying to initiate a connection to its port 80 (HTTP) from port 58372. It then replies with the following:<br /><br />(y,z)<br />FCS (Created at the end)<br />SYN, ACK<br />S TCP - 80<br />D TCP - 58372<br />S IP - B's IP<br />D IP - A's IP<br />S MAC - B's MAC<br />D MAC - R2's MAC<br /><br />The same process happens but the other way round, till A receives the SYN,ACK. It would then proceed to send an ACK. Thereafter, the HTTP connection would take place.Kelvinhttp://www.blogger.com/profile/08530586088329644767noreply@blogger.com0