Error analysis

From Course Wiki
Revision as of 05:49, 27 February 2014 by Steven Wasserman (Talk | contribs)

Jump to: navigation, search
20.309: Biological Instrumentation and Measurement

ImageBar 774.jpg


Overview

A thorough, correct, and precise discussion of experimental errors is the core of a superior lab report, and of life in general. This page will help you understand and clearly communicate the causes and consequences of experimental error.

What is experimental error?

The goal of a measurement is to determine an unknown physical quantity Q. The measurement procedure you use will produce a measured value M that in general differs from Q by some amount E. Experimental error, E, is the difference between the true value Q and the value you measure M, E = Q - M.

“Experimental error” is not a synonym for “experimental mistake,” although mistakes you make during the experiment can certainly result in errors.

Sources of error

Error sources are root causes of experimental errors. The error in a measurement E is equal to the sum of errors caused by all sources. Some examples of error sources are: thermal noise, interference from other pieces of equipment, and miscalibrated instruments. Error sources fall into three categories: fundamental, technical, and illegitimate. Fundamental error sources are physical phenomena that place an absolute lower limit on experimental error. Fundamental errors cannot be reduced. (Inherent is a synonym for fundamental.) Technical error sources can (at least in theory) be reduced by improving the instrumentation or measurement procedure — a proposition that frequently involves spending money. Illegitimate errors are mistakes made by the experimenter that affect the results. There is no excuse for those.

An ideal measurement is influenced by fundamental error sources only.

Measurement quality

Accuracy versus Precision.png

Measurements are characterized by their precision and accuracy.

Precision quantifies the variability in a measurement. Because nearly all measurements involve error sources that introduce random variation, repeated measurements of the same quantity Q rarely give identical values. Random errors are unrepeatable fluctuations that reduce precision. Observational errors E that result from random errors can be modeled as random variables with average μ = 0 and standard deviation σ. You can estimate σ by making several measurements and computing the standard deviation.

Accuracy specifies how far a measured value M is from the true value Q. Common error sources that affect accuracy are offset error (also called zero-point error), sensitivity error (also called percentage error), and nonlinearity. Systematic errors are repeatable errors that reduce accuracy. Observational errors E resulting from systematic errors have a constant value (in the case of an offset error) or are functions of Q. For example, sensitivity error gives rise to observational error E = K Q. Multiple measurements will not reveal systematic errors. The principal way to detect a systematic error is to measure a standard that has a know value of Q. In many complex measurements, finding an appropriate standard is very difficult.

Drift is another kind of error that affects many measurements, particularly ones that are taken over long time intervals. Drift errors are systematic errors with a magnitude that changes over time. Temperature changes, humidity variations, and component aging are common sources of drift errors.

Most measurements have a combination of random, systematic, and drift errors. One kind of error frequently predominates.

Averaging

One way to refine a measurement that exhibits random variation is to average N measurements:

$ M=\frac{1}{N}\sum_{i=1}^{N}{M_i}=Q-\frac{1}{N}\sum_{i=1}^{N}{E_i} $

where Mi is the i th measurement and Ei is the i th measurement error, Ei = Q - Mi . Substituting M = Q-Ei into the summation reveals that the errors of multiple measurements average together. If the errors are random, the error terms Ei will tend to cancel each other.

The central limit theorem offers a good mathematical model for what happens when you average random variables. Informally stated, the theorem says that if you average N random variables that come from the same distribution with standard deviation σ, the standard deviation of the average will be approximately σ/N. Averaging multiple measurements increases the precision of a result at the cost of decreased measurement bandwidth. In other words, precision comes at the expense of a longer measurement. The theorem also says that if N is large enough, the distribution of the average will be approximately Gaussian under a wide range of assumptions.

Averaging multiple measurements offers a simple way to increase the precision of a measurement. But because the precision increase is proportional to the square root of N, averaging frequently ends up being a resource intensive way to achieve precision. You have to average one hundred measurements to get a single additional significant digit in your result. The central limit theorem is your frenemy. The theorem offers an elegant model of the benefit of averaging multiple measurements. But it is also could also be called the Inherent Law of Diminishing Returns of the Universe. Each time you repeat a measurement, the incremental value added by your hard work diminishes. If doing a lot of averages takes a long time, drift may become a significant factor.

Classifying errors

Pentacene molecule imaged with atomic force microscope.[1]

Classify error sources based on their type (fundamental, technical, or illegitimate) and the way they affect the measurement (systematic or random). In order to come up with the correct classification, you must think each source all the way through the system: how does the underlying physical phenomenon manifest itself in the final measurement? For example, many measurements are limited by random thermal fluctuations in the sample. It is possible to reduce thermal noise by cooling the experiment. Physicists cooled the pentacene molecules shown at right to 4°C in order to image them so majestically with an atomic force microscope. But not all measurements can be undertaken at such low temperatures. Intact biological samples do not fare particularly well at 4°C. Thus, thermal noise could be considered a technical source in one instance (pentacene) and a fundamental source in another (most measurements of living biological samples). There is no hard and fast rule for classifying error sources. Consider each source carefully.

Example: the effect of a pill on temperature

Imagine you are conducting an experiment that requires you to swallow a pill and take your temperature every day for a month. You have two instruments available: an analog thermometer and a digital thermometer. Both thermometers came with detailed specifications.

The specification sheet for the analog thermometer says that it may have an offset error of less than 2°C and a random error of zero. Assume you don't have a temperature standard you can use to find the value of the offset.

The manufacturer's website for the digital thermometer says that it has 0°C offset, but noise in its amplifier causes the reading to vary randomly around the true value. The variation has an approximately Gaussian distribution with an average value of 0°C and a standard deviation of 2°C.

Which thermometer should you use?

It depends.

The raw temperature data from the digital and analog thermometers could be used in a variety of ways. It is easy to imagine experimental hypotheses that involve the average, change in, or variance in your temperature. The two thermometers result in different kinds of errors in each of the three circumstances.

Bottom line: the magnitude of random errors tends to decrease with larger N; the magnitude of systematic errors does not.

If you are measuring your body mass index, which is equal to your mass in kilograms divided by your height in meters squared, your result M will be smaller than the true value Q. Your result will also include random variation from other sources. Averaging multiple measurements will reduce the contribution of random errors, but the measured value of BMI will still be too low. No amount of averaging will correct the problem.

Sample bias

Quantization error

References

  1. Gross, et. al The Chemical Structure of a Molecule Resolved by Atomic Force Microscopy. Science 28 August 2009. DOI: 10.1126/science.1176210.