DNA melting: Identifying the unknown sample

From Course Wiki
Revision as of 18:49, 5 May 2015 by Steven Wasserman (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search
20.309: Biological Instrumentation and Measurement

ImageBar 774.jpg


In the DNA lab, you had four samples. Each sample has a true melting temperature $ M_j $ (where $ j $ is an integer from 1 to 4). The instructors told you that the fourth sample is identical to one of the other three samples, so it should also have the same melting temperature as one of the others. Your job was to figure out which one.

Most groups used their DNA melter to measure each sample $ N=3 $ times. (Some people did something slightly different.) This resulted in 12 observations, $ O_{i,j} $, where j is the sample group and i is the experimental trial number — an integer from 1 to 3.

The majority of lab groups calculated the average melting temperature of each sample group, $ \bar{O_i} $ and guessed that $ M_4 $ was the same as whichever of the known samples had the closest melting temperature. Seems reasonable.

Your observations included measurement error, so there is a possibility that an unfortunate confluence of measurement errors caused you to misidentify the unknown sample. Assume each sample run produced an observed value $ O_{i,j}=M_j+E_{i,j} $. In other words, the measured value is equal to the sum of the true value and a measurement error term. It’s not hard to imagine what factors would increase the likelihood of such an unfortunate fluke happening: the true means are close together, or the error terms are large. To get a handle on the possibility that your results were crap (due to bad luck instead of bad technique), it is useful to have a mathematical model of the measurement process. How about this?

Given the true mean values

A simple and reasonable assumption about the distribution of the error terms is that they are normally distributed with mean μ = 0 and standard deviation σ. (All of the sample types have the same distribution.) You can probably think of some ways this simple model might be deficient. For example, there is no attempt to include any kind of systematic error. If there were significant systematic error sources in your experiment, your estimate of the chance that an unlucky accident happened may be very far from the truth.

Okay. It’s a model. Now what?

There are 6 possible pairwise comparisons: M<

Student’s t-test is a statistical procedure that can assign a degree of confidence in a hypothesis that two populations have the same mean, μab. The test produces a number called a p-value. To interpret the p-value, consider all the possible outcomes of your experiment — if you repeated the procedure an infinite number of times

There are six possible pairwise hypotheses that can be tested.

In order to quantify the uncertainty of your conclusion, The analysis most of you did assumes that 
known sample had the closest melting temperature. Under the assumed error model, this is reasonable. All the sample means have the same uncertainty (if you repeated each the samples the same number of times). The error model assumes that all the error terms are drawn from the same normal distribution.

There are 6 possible comparisons $ A\stackrel{?}{=}B $