Difference between revisions of "Error analysis"

From Course Wiki
Jump to: navigation, search
m (Question 2)
m (Question 5)
Line 94: Line 94:
 
A popular restaurant accommodates ''N''=100 diners. From the hours of 10:00-2:00, the restaurant is fully occupied. The number of unisex restrooms in the restaurant is ''R''.  
 
A popular restaurant accommodates ''N''=100 diners. From the hours of 10:00-2:00, the restaurant is fully occupied. The number of unisex restrooms in the restaurant is ''R''.  
  
During any given 1 minute interval, there is a 2% chance that each diner will decide to visit the restroom. If a diner decides it's time to go, he or she heads to the head at the beginning of the next minute and remains for 30 seconds. If all of the restrooms are occupied, the diner returns their table unsatisfied. The probability that the diner will visit the restroom in future intervals is unaffected.
+
During any given 1 minute interval, there is a 2% chance that each diner will decide to visit the restroom. If a diner decides it's time to go, he or she heads to the head at the beginning of the next minute and remains for 30 seconds. If all of the restrooms are occupied, the diner returns to their table unsatisfied. The probability that the diner will visit the restroom in future intervals is unaffected.
  
 
* What value of ''R'' will ensure that there is no more than a 2.5% percent chance that a diner will find the restroom full?
 
* What value of ''R'' will ensure that there is no more than a 2.5% percent chance that a diner will find the restroom full?

Revision as of 13:30, 16 February 2015

20.309: Biological Instrumentation and Measurement

ImageBar 774.jpg




Correlation (XKCD #552)


Warning: this page is a work in progress.

Measurement error

Model of measurement with additive noise.

The purpose of making a measurement is to determine the value of some unknown physical quantity Q, called the measurand. In general, measurement procedures produce a measured value M that differs from Q by some amount E. Measurement error, E, is the difference between the true value Q and the measured value M, E = Q - M. People also use the terms observational error and experimental error to refer to E.

The presence of measurement error limits your ability to make inferences from experimental data. Measurement error gives rise to the possibility that a particular experimental result was obtained just by chance or because of a shortcoming of the measurement procedure.

Error sources are root causes of measurement error. The total error E in a measurement is equal to the sum of the errors caused by each individual source. Some examples of error sources are: thermal noise, interference from other pieces of equipment, and miscalibrated instruments. Error sources are classified according to their variability. Random errors are unrepeatable fluctuations in a measurement. The value of a random error is different each time a measurement is repeated. Systematic errors are deterministic functions of Q. They always cause the same error E for a given value of Q. It is possible for an error source to fall into both categories.

Many random errors can be modeled by a random variable with a distribution that has an average value μ = 0 and standard deviation σ > 0. It is possible to estimate σ by making several measurements of the same quantity and computing the sample standard deviation. (Of course, the standard deviation obtained from a sample of results could be larger or smaller than the true value.) It is very common that a measurement method includes several sources of random error. As a result, the distribution of errors frequently looks approximately Gaussian — even in cases where the assumptions of the central limit theorem are not perfectly satisfied.

In the case of purely systematic error sources, repeated measurements always give the same value. The standard deviation of a set of measurements that include a pure systematic error is zero. Therefore, multiple measurements do not reveal systematic errors, which makes systematic errors much more difficult to characterize than random errors. The principal way to detect a systematic error is to measure known standards. In many complex measurements, finding an appropriate standard is difficult.

Common models for systematic error sources include offset error (also called zero-point error), sensitivity error (also called percentage error), and nonlinearity. In a zero-point error, the observational error E is constant. If a standard is available, it is possible to find the constant and subtract it from the result — a process called calibration. Sensitivity errors have the form E = K Q. Like offset errors, sensitivity errors can also be corrected by calibration (with at least two standards). In nonlinear errors, K is a function of Q. (The term "sensitivity" is frequently used imprecisely. It would be clearer to call this type of error a responsivity error.) Nonlinear errors require a great deal more care to correct than offset and sensitivity errors.

Another important characteristic of error sources is the degree to which they can be reduced or eliminated. Fundamental error sources are physical phenomena that place an absolute lower limit on experimental error. Fundamental errors cannot be reduced. (Inherent is a synonym for fundamental.) Technical error sources can (at least in theory) be reduced by improving the instrumentation or measurement procedure — a proposition that frequently involves spending money. An ideal measurement is limited by fundamental error sources only.

Drift error is another fundamental kind of error that affects many measurements — particularly ones that are taken over long time intervals. Drift errors are systematic errors with a magnitude that changes over time. Temperature changes, humidity variations, and component aging are common sources of drift errors. Many drift errors have the characteristic that their magnitude increases roughly in proportion to the square root of time.

Most measurements include a combination of random, systematic, and drift errors. One kind of error frequently predominates.

Averaging

Each "x" represents one simulated result obtained by averaging N repeated measurements of a quantity with true value Q = 42 in the presence of a random measurement error with distribution $ E\in N(\mu=0;\sigma=1) $. Vertical axis shows the average; horizontal axis is N. Uncertainty in the mean decreases as the square root of the number of trials, N. The red lines contain ⅔ of the simulated results for each N.

One way to refine a measurement that exhibits random variation is to average N measurements:

$ M=\frac{1}{N}\sum_{i=1}^{N}{M_i}=Q-\frac{1}{N}\sum_{i=1}^{N}{E_i} $

where Mi is the i th measurement and Ei is the i th measurement error, Ei = Q - Mi . Substituting Mi = Q - Ei into the summation reveals that the error terms Ei of multiple measurements average together. If the errors are random, the error terms tend to cancel each other. (This assumes that Q is constant. We will discuss what happens if Q is a function of time later in the class.)

The central limit theorem offers a good mathematical model for what happens when you average measurements that include random observational error. Informally stated, the theorem says that if you average N random variables that come from the same distribution with standard deviation σ, the standard deviation of the average will be approximately σ/N. Averaging multiple measurements increases the precision of a result at the cost of decreased measurement bandwidth. In other words, you can't measure Q as frequently, which is a problem if you expect Q to be changing quickly. Precision comes at the expense of a longer measurement. The central limit theorem also says that if N is large enough, the distribution of the average will be approximately Gaussian. The central limit theorem holds under a wide range of assumptions about the distribution of the errors.

Averaging multiple measurements offers a simple way to increase the precision of a measurement. Because the precision increase is proportional to the square root of N, averaging is a resource-intensive way to achieve precision. You have to average one hundred measurements to get a single additional significant digit in your result. The central limit theorem is your frenemy. The theorem offers an elegant model of the benefit of averaging multiple measurements. But it is also could also be called the Inherent Law of Diminishing Returns of the Universe. Each time you repeat a measurement, the incremental value added by your hard work diminishes. The tenth measurement is only about 11% as valuable as the first; the hundredth carries only about three percent of the benefit.

If it takes a long time to repeat a measurement for averaging, drift may become a significant factor.

Reporting measurement uncertainty

All measured values in 20.309 must be reported with an associated measure of variability, which is usually the range, standard deviation, or standard error of the dataset. Standard error is equal to the sample standard deviation divided by the number of data points. Use the abbreviation "s.d." for standard deviation and "s.e.m." for standard error after the "±". For example: 1.21 ± 0.03 GW (±s.d., N=42). Round uncertain quantities to the same decimal place as the uncertainty. Uncertainty is typically reported with one or two significant figures. The sample size must be included in all cases. Report uncertainty in the same units as the measurand. [1]

Standard error is the best choice for datasets that contain 20 or more samples. It can be interpreted as an estimate of the size of an interval that would contain the result of repeating the experiment about ⅔ of the time. Range is a good choice for small values of N.

The guidelines for reporting systematic errors are much less sharply drawn. A fundamental problem is that many sources of systematic error are impossible to quantify. Because of this, experiments frequently employ design features such as differential measurements to mitigate the impact of systematic errors. Systematic error should be quantified to the degree possible in studies that depend on the absolute accuracy of a measurand. For example, this excellent paper details a very precise measurement of the Newtonian gravitational constant G. It includes a spectacular discussion of systematic error. The paper also includes a disclaimer: "The possibility that unknown systematic errors still exist in traditional measurements makes it important to measure G with independent methods." True that.

Measurement quality

Accuracy versus Precision.png

Measurements are characterized by their precision and accuracy. Precision encompasses errors that are random in nature. Accuracy comprises errors that are deterministic. Precision quantifies the variability of a measurement. Accuracy specifies how far a measurement is from the truth. (Because the term "accuracy" has been so poorly used, some people use the term "trueness" instead of accuracy.) Systematic errors reduce accuracy.

Review of probability and statists concepts

This section is coming soon to a wiki page near you. Pay attention in lecture on 9/9.

Propagating errors

Measurements are frequently used as inputs to calculations. When calculating values based on measurements that include observational error, it is necessary to consider the effect of the error on the calculated value — a process called error propagation. A thorough treatment of error propagation would likely cause you to navigate away from this page to your favorite cat video. Fortunately, a few simple rules will get you through many of the calculations in 20.309. The treatment of errors in some analyses, like nonlinear regression, is more complicated.

This page has a concise summary of the rules for propagating errors through calculations. (Here is another succinct page.)

Classifying errors

Pentacene molecule imaged with atomic force microscope.[2]

Classify error sources based on their type (fundamental, technical, or illegitimate) and the way they affect the measurement (systematic or random). In order to come up with the correct classification, you must think each source all the way through the system: how does the underlying physical phenomenon manifest itself in the final measurement? For example, many measurements are limited by random thermal fluctuations in the sample. It is possible to reduce thermal noise by cooling the experiment. Physicists cooled the pentacene molecules shown at right to 4°C in order to image them so majestically with an atomic force microscope. But not all measurements can be undertaken at such low temperatures. Intact biological samples do not fare well at 4°C. Thus, thermal noise could be considered a technical source in one instance (pentacene) and a fundamental source in another (most measurements of living biological samples). There is no hard and fast rule for classifying error sources. Consider each source carefully.

Questions

Question 1

  • Why should you be suspicious of a thermometer that specifies zero random error?

Question 2

In the third century BCE, Eratosthenes measured C, the circumference of the Earth using only a stick, a piece of useful information, an assumption, and perhaps some camels. Unfortunately, many details of Eratosthenes' method — like how he measured the distance between Alexandria and Syene — are lost to history. Scientific license and camels have been used to fill in some details for purposes of this question. Assume that Eratosthenes had access to a dividing compass (the kind with two pointy ends) and that he measured the distance between Alexandria and Syene by timing how long it took several camels to make the voyage. For a thorough discussion of what is known and what is supposed about Eratosthenes' method (and a lot of storytelling) see, for example: Circumference: Eratosthenes and the Ancient Quest to Measure the Globe by Nicholas Nicastro.

The piece of information Eratosthenes used was this: at noon on the day of summer solstice, in the city of Syene to the south of Alexandria where Eratosthenes lived, the sun shone on the bottom of a deep well and cast no shadow. Eratosthenes realized this meant that the sun was directly over Syene. At the same time in Alexandria, there were shadows. Eratosthenes measured the angle of the shadows in Alexandria at noon on the same day. You might presume that Eratosthenes used trigonometry to find C from the angle and the distance between the cities (assuming the Earth is a sphere). But trigonometry had not yet been invented. So instead, he determined what fraction of a circle the angle was (perhaps using a dividing compass) and set up a ratio.

Eratosthenes' measurement was certainly very accurate for its time. Because he used the archaic, non standardized unit of stadia, sadly nobody can be sure how close he was.

  • Make a list of the error sources that affected Eratosthenes' measurement. Characterize each error source as systematic or random. To the degree possible, quantify the effect of each error source on the value of C that Eratosthenes' measured.

Question 3

A position detector has zero-point error of 2x10-6 m and random error with a standard deviation of 2x10-5 m. Write a computer program in MATLAB or another language of your choosing to simulate the detector's output for repeated measurements of a particle fixed at a true position of 5x10-6 m.

  1. Produce a plot that shows the effect of averaging multiple measurements. The horizontal axis should be N, the number of averages. Simulate 20 experiments for each value of N between 1 and 100 and plot an "X" to show the average value for each simulated result.
  2. If the particle detector is used to measure the variance of the particle position, how does the result change? Make another plot where each "X" is the average variance obtained from multiple measurements.
  3. Is the error in measured variance systematic or random?

Question 4

You are given a rod of unknown length, a bucket of 100 blocks, and a ruler marked in centimeters. The bucket of blocks says on the side, "Contents: 100 cubic blocks. Average length = 100 mm. Standard deviation = 15 mm.

  • Assuming that you cannot reliably measure distances more accurately than 1 cm, devise a procedure for measuring the rod as precisely as possible. Outline the procedure and state the uncertainty in the final result.

Question 5

A popular restaurant accommodates N=100 diners. From the hours of 10:00-2:00, the restaurant is fully occupied. The number of unisex restrooms in the restaurant is R.

During any given 1 minute interval, there is a 2% chance that each diner will decide to visit the restroom. If a diner decides it's time to go, he or she heads to the head at the beginning of the next minute and remains for 30 seconds. If all of the restrooms are occupied, the diner returns to their table unsatisfied. The probability that the diner will visit the restroom in future intervals is unaffected.

  • What value of R will ensure that there is no more than a 2.5% percent chance that a diner will find the restroom full?
  • What is the value of R if the capacity of the restaurant is increased to accommodate 1,000 diners?
  • Plot R versus N for 10 < N < 10000. Use logarithmic axes.

References

  1. Relative (percent) uncertainty is undesirable in the presence of additive noise because a constant magnitude error source produces different reported values depending on the measurand.
  2. Gross, et. al The Chemical Structure of a Molecule Resolved by Atomic Force Microscopy. Science 28 August 2009. DOI: 10.1126/science.1176210.