Difference between revisions of "Assignment 9, Part 3: Fitting your data"

Latest revision as of 22:39, 27 April 2018

Congratulations! you should now have a working version of your analysis code.

Estimate model parameters for real data

Use your newly developed code to estimate the parameters associated with a set of DNA melting data that you took using your instrument. (You may use a data set you took from a previous week.)

Tip: you will be running this type of operation many times for many different DNA melting curves in the next couple weeks. It may be helpful to write a function to make this task easily repeatable.

Plot your fluorescence data as a function of block temperature, your model function with initial guesses, and your model function with best fit parameters on the same set of axes.
Record your estimates for ΔH and ΔS. Calculate T_m.
How do these thermodynamic parameters compare to the predicted values you obtained from DINAmelt or OligoCalc?

Residual plot

Residuals plotted versus time, temperature, fluorescence, and the cumulative sum of fluorescence

Observed values differ from predicted values because of noise and systematic errors in the model. Residuals are the difference between experimental observations and model predictions, V_f,measured−V_f,model. Ideally, the residuals should be random and identically distributed.

The plots at right show V_f,measured−V_f,model, versus temperature, time, fluorescence, and the cumulative sum of fluorescence. The residuals are clearly not random and identically distributed. This suggests that the model does not perfectly explain the observations. The scale of the plot is much smaller than the data plot — about one percent of the data scale.

A perfect model might require dozens of added parameters and additional physical measurements.

Plotting the residuals versus different variables can help suggest what factors are not modeled well.

Plot the residuals vs.

time,
temperature, and
fluorescence.

Finding double stranded DNA fraction from raw data

The inverse function of the melting model with respect to V_f,measured(t) is helpful to visualize discrepancies between the model and experimental data caused by random noise in V_f,measured and systematic error in the model V_f,model. The function,

$ C_{ds,inverse-model}(V_{f,measured}(t)) = \frac{V_{f,measured}(t) - K_{offset}} {K_{gain} S(t) Q(t)} $,

is itself a model. This model estimates the concentration of double stranded DNA based on the observations $ V_{f,measured}(t) $ and the models for bleaching and quenching.

The estimated melting curve may be directly compared with simulations, measurements or other predictions of the true melting curve. The plot at right shows an example of C_{ds,inverse-model}(t) versus T_sample(t). The estimated melting curve is shifted to the right compared to the simulated melting curve, possibly due to systematic error in the sample temperature model. The estimated melting curve also serves as a comparison to the thermodynamic model developed in DNA Melting Thermodynamics, or to any other independent measurement or model of the melting curve, i.e., the concentration of dsDNA vs sample temperature.

Write a function to convert fluorescence into fraction of double stranded DNA. For at least one experimental trial, plot $ \text{DnaFraction}_{inverse-model} $ versus the sample temperature $ T_{sample} $ (example plot). On the same set of axes plot DnaFraction versus $ T_{sample} $ using the best-fit values of ΔH and ΔS. Finally, plot simulated dsDNA fraction vs. temperature using data from DINAmelt or another melting curve simulator.

Comparing the known and unknown samples

In the next assignment, you will be comparing an unknown sample to a set of three known samples in order to determine its identity. Read through this page: Identifying the unknown DNA sample, to learn about the statistics behind making multiple comparisons.

Explain the statistical method you will use to identify your group's unknown sample in Assignment 10.

State the acceptance/rejection criteria for any hypotheses/tests you will use.
This page may be a helpful reference: Identifying the unknown DNA sample.

Append all of the code you wrote for Parts 1, 2 and 3 of this assignment.

Navigation

Assignment 9 Overview
Part 1: model function
Part 2: test your code with simulated data
Part 3: fitting your data

Back to 20.309 Main Page

@@ Line 1: / Line 1: @@
+__NOTOC__
 Congratulations! you should now have a working version of your analysis code.
 ==Estimate model parameters for real data==
-Use your model function and nlinfit to estimate the parameters associated with a set of DNA melting data that you took using your instrument. (set from last week.
+Use your newly developed code to estimate the parameters associated with a set of DNA melting data that you took using your instrument. (You may use a data set you took from a previous week.)
+''Tip:'' you will be running this type of operation many times for many different DNA melting curves in the next couple weeks. It may be helpful to write a function to make this task easily repeatable.
 {{Template:Assignment Turn In|message = <br/>
-#Plot your fluorescence data as a function of temperature, your model function with initial guesses, and your model function with best fit parameters on the same set of axes.
+#Plot your fluorescence data as a function of block temperature, your model function with initial guesses, and your model function with best fit parameters on the same set of axes.
-#Record your estimates for &Delta H and &Delta S (along with their uncertainties). Calculate Tm.
+#Record your estimates for &Delta;H and &Delta;S. Calculate T<sub>m</sub>.
 #How do these thermodynamic parameters compare to the predicted values you obtained from DINAmelt or OligoCalc?
 }}
-==Comparing the known and unknown samples==
+==Residual plot==
-One possible way to compare the unknown sample to the three knowns is to use Matlab's [http://www.mathworks.com/help/stats/analysis-of-variance-anova-1.html anova] and [http://www.mathworks.com/help/stats/multcompare.html multcompare] functions. Anova takes care of the difficulties that arise when comparing multiple sample means using Student's t-test. Read through this page: [[Identifying the unknown DNA sample]], to learn about the statistics behind making multiple comparisons.
+[[Image:Residual plot for DNA data.png|thumb|right|Residuals plotted versus time, temperature, fluorescence, and the cumulative sum of fluorescence]]
+Observed values differ from predicted values because of noise and systematic errors in the model. Residuals are the difference between experimental observations and model predictions, ''V<sub>f,measured</sub>''&minus;''V<sub>f,model</sub>''. Ideally, the residuals should be random and identically distributed.
-The following code creates a simulated set of melting temperatures for three known samples and one unknown. In the simulation, each sample was run three times. The samples have melting temperatures of 270, 272, and 274 degrees. The unknown sample has a melting temperature of 272 degrees. Random noise is added to the true mean values to generate simulated results. Try running the code with different values of <code>noiseStandardDeviation</code>.
+The plots at right show ''V<sub>f,measured</sub>''&minus;''V<sub>f,model</sub>'', versus temperature, time, fluorescence, and the cumulative sum of fluorescence. The residuals are clearly not random and identically distributed. This suggests that the model does not perfectly explain the observations. The scale of the plot is much smaller than the data plot &mdash; about one percent of the data scale.
-<pre>
+A perfect model might require dozens of added parameters and additional physical measurements.
-% create simulated dataset
-noiseStandardDeviation = 0.5;
-meltingTemperature = [270 270 270 272 272 272 274 274 274 272 272 272]
-      + noiseStandardDeviation * randn(1,12);
-sampleName = {'20bp' '20bp' '20bp' '30bp' '30bp' '30bp' '40bp' '40bp' '40bp'
-      'unknown' 'unknown' 'unknown'};
-% compute anova statistics
+Plotting the residuals versus different variables can help suggest what factors are not modeled well.
-[p, table, anovaStatistics] = anova1(meltingTemperature, sampleName);
-% do the multiple comparison
+{{Template:Assignment Turn In|message= Plot the residuals vs.
-[comparison means plotHandle groupNames] = multcompare(anovaStatistics);
+# time,
-</pre>
+# temperature, and
+# fluorescence.
+}}
-[[Image:Multcompare output.png|right|thumb|Output of <code>multcompare</code> command.]]
+==Finding double stranded DNA fraction from raw data==
-The <code>multcompare</code> function generates a table of confidence intervals for each possible pair-wise comparison. You can use the table to determine whether the means of two samples are significantly different. Output of <code>multcompare</code> is shown at right. If your data has a lot of variation, you might have to use the options to reduce the confidence level. (Or there might not be a significant difference at all.) Note that <code>multcompare</code> has a default confidence level of 95% (alpha = 0.05). One way to assess how confident you are in your sample identification is by finding the lowest "alpha" value needed to identify your sample.
+[[Image:Corrected DNA data.png|thumb|right]]
-Consider devising a more sophisticated method that uses both the &Delta;H&deg; and &Delta;S&deg; values, instead.
+The inverse function of the melting model with respect to ''V<sub>f,measured</sub>''(''t'') is helpful to visualize discrepancies between the model and experimental data caused by random noise in ''V<sub>f,measured</sub>'' and systematic error in the model ''V<sub>f,model''.  The function,
+::<math>C_{ds,inverse-model}(V_{f,measured}(t)) = \frac{V_{f,measured}(t) - K_{offset}} {K_{gain} S(t) Q(t)}</math>,
+is itself a model. This model estimates the concentration of double stranded DNA based on the observations <math>V_{f,measured}(t)</math> and the models for bleaching and quenching.
+The estimated melting curve may be directly compared with simulations, measurements or other predictions of the true melting curve. The plot at right shows an example of ''C<sub>ds,inverse-model</sub>''(''t'') versus ''T<sub>sample</sub>''(''t''). The estimated melting curve is shifted to the right compared to the simulated melting curve, possibly due to systematic error in the sample temperature model. The estimated melting curve also serves as a comparison to the thermodynamic model developed in [[DNA Melting Thermodynamics]], or to any other independent measurement or model of the melting curve, i.e., the concentration of dsDNA vs sample temperature.
+{{Template:Assignment Turn In|message= Write a function to convert fluorescence into fraction of double stranded DNA. For at least one experimental trial, plot <math>\text{DnaFraction}_{inverse-model}</math> versus the ''sample temperature'' <math>T_{sample}</math> ([http://measurebiology.org/wiki/File:Inverse_cuvrve.png example plot]). On the same set of axes plot DnaFraction versus <math>T_{sample}</math> using the best-fit values of &Delta;H and &Delta;S. Finally, plot simulated dsDNA fraction vs. temperature using data from DINAmelt or another melting curve simulator.}}
+==Comparing the known and unknown samples==
+In the next assignment, you will be comparing an unknown sample to a set of three known samples in order to determine its identity. Read through this page: [[Identifying the unknown DNA sample]], to learn about the statistics behind making multiple comparisons.
 {{Template:Assignment Turn In|message=<br/>
-# Explain the statistical method you will use to identify your group's unknown sample in part 2 of this lab.
+Explain the statistical method you will use to identify your group's unknown sample in Assignment 10.
-#* State the acceptance/rejection criteria for any hypotheses tests you will use.
+* State the acceptance/rejection criteria for any hypotheses/tests you will use.
-#* This page may be a helpful reference: [[Identifying the unknown DNA sample]].
+* This page may be a helpful reference: [[Identifying the unknown DNA sample]].
 }}
 {{Template:Assignment Turn In|message = Append all of the code you wrote for Parts 1, 2 and 3 of this assignment. }}
+{{Template:Assignment 9 navigation}}

Difference between revisions of "Assignment 9, Part 3: Fitting your data"

Latest revision as of 22:39, 27 April 2018

Estimate model parameters for real data

Residual plot

Finding double stranded DNA fraction from raw data

Comparing the known and unknown samples

Navigation

Navigation menu

Personal tools

Namespaces

Variants

Views

Actions

Search

Navigation

Tools