Difference between revisions of "DNA melting: Identifying the unknown sample"

From Course Wiki
Jump to: navigation, search
Line 9: Line 9:
 
Your observations included measurement error, <math>O_{i,j}=M_j+E_{i,j}</math>. The presence of measurement error leads to the possibility that an unfortunate confluence of error terms might cause you to misidentify the unknown sample. It’s not hard to imagine what factors tend to increase the likelihood of such an unfortunate fluke: the true means are close together, or the error terms are large. To get a handle on the possibility that your results were crap due to bad luck alone (not incompetence), it is necessary to have some kind of model for the distribution of the error terms. How about this? The error terms are normally distributed with mean <math>\mu=0</math> and standard deviation <math>\sigma</math>. (Note that the error distribution among all of the sample types is the same.) Within the confines of this model, it is possible to estimate the chance that your result was a fluke.  
 
Your observations included measurement error, <math>O_{i,j}=M_j+E_{i,j}</math>. The presence of measurement error leads to the possibility that an unfortunate confluence of error terms might cause you to misidentify the unknown sample. It’s not hard to imagine what factors tend to increase the likelihood of such an unfortunate fluke: the true means are close together, or the error terms are large. To get a handle on the possibility that your results were crap due to bad luck alone (not incompetence), it is necessary to have some kind of model for the distribution of the error terms. How about this? The error terms are normally distributed with mean <math>\mu=0</math> and standard deviation <math>\sigma</math>. (Note that the error distribution among all of the sample types is the same.) Within the confines of this model, it is possible to estimate the chance that your result was a fluke.  
  
There are 6 possible pairwise hypotheses to test: <math>M_4\stackrel{?}{=}M_1</math>, <math>M_4\stackrel{?}{=}M_2</math>, <math>M_4\stackrel{?}{=}M_3</math>, <math>M_1\stackrel{?}{=}M_2</math>, <math>M_1\stackrel{?}{=}M_3</math>, <math>M_2\stackrel{?}{=}M_3</math>. If all goes well, one of the first 3 hypotheses is accepted and the other 5 are rejected. Student’s t-test offers a method for assigning a numerical degree of confidence in to each hypothesis. Essentially, the test considers all the possible outcomes of the experimental study. Imagine if you repeated the study an infinite number of times, you would obtain all possible outcomes of <math>E_{j,i}</math>. These outcomes are divided into two realms: those that are more favorable to the null hypothesis (i.e., <math>O</math> is closer to <math>M</math> than the result you got); and those that are less favorable (<math>O</math> is farther from <math>M</math> than the result you got). The t-test can be summarized by a p-value, which is the percentage of all possible outcomes that are less favorable than the result you got. A low p-value means that there are few possible results less favorable to the null hypotheses, so it's probably a good idea to reject it. In most circumstances, the experimenter chooses a significance level such as 10% or 5% or 1% in advance of examining the data.
+
There are 6 possible pairwise hypotheses to test: <math>M_4\stackrel{?}{=}M_1</math>; <math>M_4\stackrel{?}{=}M_2</math>; <math>M_4\stackrel{?}{=}M_3</math>; <math>M_1\stackrel{?}{=}M_2</math>; <math>M_1\stackrel{?}{=}M_3</math>; and <math>M_2\stackrel{?}{=}M_3</math>. If all goes well, one of the first 3 hypotheses is accepted and the other 5 are rejected. You could argue that only the first three hypotheses are relevant to the question at hand. Such an argument may even be legitimate in a technical sense. But the defense counsel would surely have some fun at your expense while you were on the witness stand at the murder trial if any of the last three null hypotheses could not be rejected. You can also argue that at least two of the last three null hypotheses must be true. Does it make sense to say that the unknown is sample 1, but you can't tell 1 from 2?
 +
 
 +
Student’s t-test offers a method for assigning a numerical degree of confidence in to each null hypothesis. Essentially, the test considers the entire universe of possible outcomes of your experimental study. Imagine that you repeated the study an infinite number of times. You would obtain all possible outcomes of <math>E_{j,i}</math>. The test divides these outcomes into two realms: those that are more favorable to the null hypothesis (i.e., <math>O</math> is closer to <math>M</math> than the result you got); and those that are less favorable (<math>O</math> is farther from <math>M</math> than the result you got). The t-test can be summarized by a p-value, which is equal to the percentage of possible outcomes that is less favorable to the null hypothesis than the result you got. A low p-value means that there are not many possible results less favorable to the null hypotheses than the one you got, so it's probably reasonable thing to reject it. Rejecting the null hypothesis means that the means are likely not the same. In most circumstances, the experimenter chooses a significance level such as 10% or 5% or 1% in advance of examining the data. Another way to think of this: if you chose a significance level of 5% and repeated the study 100 times, you would expect to get the wrong answer because of bad luck on 5 occasions.
 +
 
 +
A problem comes up when you compare multiple means using the t-test. For example, if you chose a significance level for each t-test of 5%, you would expect a 30% probability of incorrectly rejecting the null hypothesis somewhere among the 6 comparisons. The <tt>multcompare</tt> function in MATLAB implements a correction to the t-test procedure that assures the ''family-wise error rate'' (FWER) less than the threshold you set, say 5%. In other words, the chance of any hypothesis being rejected just due to bad luck is less than the FWER. You can set the FWER with an argument to <tt>multcompare</tt>.
 +
 
 +
If you used <tt>multcompare</tt> in your analysis, you can say that the probability that you rejected any of the null hypotheses due to luck was less than the FWER you set. Since all the hypotheses may not be required to identify the unknown, this is a slightly conservative statement. You required more things to be true than you strictly needed. But it is likely that you wouldn't gain much by excluding the hypotheses that weren't required.
 +
 
 +
You can probably think of ways the simple error model we relied on might be deficient. For example, there is no attempt to include any kind of systematic error. If there were significant systematic error sources in your experiment, your estimate of the chance that an unlucky accident happened may be very far from the truth. Because most real experiments do not perfectly satisfy the assumptions of the test, it is usually ridiculous to report an extremely small p-value. (That doesn't stop people from doing it, though.)
  
hypothesis that two populations have the same mean, &mu;<sub>a</sub>=&mu;<sub>b</sub>. The test produces a number called a p-value.  To interpret the p-value, consider all the possible outcomes of your experiment &mdash; if you repeated the procedure an infinite number of times
 
  
You can probably think of some ways this simple model might be deficient. For example, there is no attempt to include any kind of systematic error. If there were significant systematic error sources in your experiment, your estimate of the chance that an unlucky accident happened may be very far from the truth.
 
Given the true mean values
 
  
 
A simple and reasonable assumption about the distribution of the error terms is that they  
 
A simple and reasonable assumption about the distribution of the error terms is that they  

Revision as of 19:57, 5 May 2015

20.309: Biological Instrumentation and Measurement

ImageBar 774.jpg


In the DNA lab, you had four samples. Each sample has a true melting temperature $ M_j $ (where $ j $ is an integer from 1 to 4). The instructors told you that the fourth sample is identical to one of the other three samples. Therefore, the unknown should have exactly the same melting temperature as sample 1, 2, or 3. Your job was to figure out which one matched the unknown.

Most groups measured each sample group in triplicate: $ N=3 $. (Some special students did something a little bit different.) This resulted in 12 observations, $ O_{i,j} $, where $ j $ is the sample group and $ i $ is the experimental trial number — an integer from 1 to 3. The majority of lab groups calculated the average melting temperature of each sample group, $ \bar{O_i} $ and guessed that $ M_4 $ was the same as whichever of the known samples had the closest melting temperature.

Seems reasonable.

Your observations included measurement error, $ O_{i,j}=M_j+E_{i,j} $. The presence of measurement error leads to the possibility that an unfortunate confluence of error terms might cause you to misidentify the unknown sample. It’s not hard to imagine what factors tend to increase the likelihood of such an unfortunate fluke: the true means are close together, or the error terms are large. To get a handle on the possibility that your results were crap due to bad luck alone (not incompetence), it is necessary to have some kind of model for the distribution of the error terms. How about this? The error terms are normally distributed with mean $ \mu=0 $ and standard deviation $ \sigma $. (Note that the error distribution among all of the sample types is the same.) Within the confines of this model, it is possible to estimate the chance that your result was a fluke.

There are 6 possible pairwise hypotheses to test: $ M_4\stackrel{?}{=}M_1 $; $ M_4\stackrel{?}{=}M_2 $; $ M_4\stackrel{?}{=}M_3 $; $ M_1\stackrel{?}{=}M_2 $; $ M_1\stackrel{?}{=}M_3 $; and $ M_2\stackrel{?}{=}M_3 $. If all goes well, one of the first 3 hypotheses is accepted and the other 5 are rejected. You could argue that only the first three hypotheses are relevant to the question at hand. Such an argument may even be legitimate in a technical sense. But the defense counsel would surely have some fun at your expense while you were on the witness stand at the murder trial if any of the last three null hypotheses could not be rejected. You can also argue that at least two of the last three null hypotheses must be true. Does it make sense to say that the unknown is sample 1, but you can't tell 1 from 2?

Student’s t-test offers a method for assigning a numerical degree of confidence in to each null hypothesis. Essentially, the test considers the entire universe of possible outcomes of your experimental study. Imagine that you repeated the study an infinite number of times. You would obtain all possible outcomes of $ E_{j,i} $. The test divides these outcomes into two realms: those that are more favorable to the null hypothesis (i.e., $ O $ is closer to $ M $ than the result you got); and those that are less favorable ($ O $ is farther from $ M $ than the result you got). The t-test can be summarized by a p-value, which is equal to the percentage of possible outcomes that is less favorable to the null hypothesis than the result you got. A low p-value means that there are not many possible results less favorable to the null hypotheses than the one you got, so it's probably reasonable thing to reject it. Rejecting the null hypothesis means that the means are likely not the same. In most circumstances, the experimenter chooses a significance level such as 10% or 5% or 1% in advance of examining the data. Another way to think of this: if you chose a significance level of 5% and repeated the study 100 times, you would expect to get the wrong answer because of bad luck on 5 occasions.

A problem comes up when you compare multiple means using the t-test. For example, if you chose a significance level for each t-test of 5%, you would expect a 30% probability of incorrectly rejecting the null hypothesis somewhere among the 6 comparisons. The multcompare function in MATLAB implements a correction to the t-test procedure that assures the family-wise error rate (FWER) less than the threshold you set, say 5%. In other words, the chance of any hypothesis being rejected just due to bad luck is less than the FWER. You can set the FWER with an argument to multcompare.

If you used multcompare in your analysis, you can say that the probability that you rejected any of the null hypotheses due to luck was less than the FWER you set. Since all the hypotheses may not be required to identify the unknown, this is a slightly conservative statement. You required more things to be true than you strictly needed. But it is likely that you wouldn't gain much by excluding the hypotheses that weren't required.

You can probably think of ways the simple error model we relied on might be deficient. For example, there is no attempt to include any kind of systematic error. If there were significant systematic error sources in your experiment, your estimate of the chance that an unlucky accident happened may be very far from the truth. Because most real experiments do not perfectly satisfy the assumptions of the test, it is usually ridiculous to report an extremely small p-value. (That doesn't stop people from doing it, though.)


A simple and reasonable assumption about the distribution of the error terms is that they Okay. It’s a model. Now what?

There are 6 possible pairwise comparisons: M<

S

There are six possible pairwise hypotheses that can be tested.

In order to quantify the uncertainty of your conclusion, The analysis most of you did assumes that 
known sample had the closest melting temperature. Under the assumed error model, this is reasonable. All the sample means have the same uncertainty (if you repeated each the samples the same number of times). The error model assumes that all the error terms are drawn from the same normal distribution.

There are 6 possible comparisons