Difference between revisions of "DNA Melting: Simulating DNA Melting - Intermediate Topics"

From Course Wiki
Jump to: navigation, search
Line 2: Line 2:
 
[[Category:DNA Melting Lab]]
 
[[Category:DNA Melting Lab]]
 
<div style="padding: 10px; width: 774px; border: 3px solid #000000;">
 
<div style="padding: 10px; width: 774px; border: 3px solid #000000;">
==Introduction==
 
This tutorial will show you how to simulate the DNA melting experiment with Matlab. The DNA melting apparatus you will build produces two output voltages related to the sample temperature and fluorescence intensity. During an experimental run, these two voltages will be recorded periodically. At the end of this tutorial, you should be able to produce mock datasets that are similar to what you will record in the lab. You can use simulated datasets to help develop and debug your data analysis scripts.
 
 
It will be essential to have working data analysis scripts before you begin running experiments in the lab. When you have completed this tutorial, refer to the lab manual for data analysis instructions.
 
 
The procedure for creating the simulation is:
 
# Write a function to compute the equilibrium dsDNA contration vs. temperature
 
# Model sample temperature versus time during an experimental run
 
# Compute the RTD voltage from temperature vs. time data
 
# Add noise to the RTD signal
 
# Compute relative fluorescence intensity from temperature vs. time data
 
# Model photobleaching (optional)
 
# Scale the fluorescence intensity signal and add noise
 
# Reformat the data to match what is produced by the DNA melting LabView VI
 
 
You will not have to write any new code to complete this tutorial; however, you should open a Matlab window and follow along as you read. Code from this page can be copied and pasted into the Matlab command window. M-files are available in the course locker.
 
 
If you prefer to work in Python, please see [[Python:Simulating DNA Melting|this page]].
 
 
===Before you begin===
 
*Read the [[Lab Manuals:DNA Melting|DNA Melting Lab Manual]].
 
*Review the [[DNA Melting Thermodynamics|DNA Melting Thermodynamics Notes]].
 
*If you need a review of Matlab fundamentals, see [[Matlab:Matlab Fundamentals|Matlab Fundamentals]] tutorial.
 
 
===Matlab skills===
 
* Array arithmetic (element-by-element) operators (<tt>+</tt>, <tt>-</tt>, <tt>.*</tt>, <tt>./</tt>)
 
* Functions declared in an m-file
 
* Conditionals
 
* Functions that take a variable number of arguments
 
* Numerical differentiation
 
* Plotting data
 
 
==Modeling dsDNA equilibrium concentraion==
 
We derived an expression for dsDNA concentration at a given temperature [[DNA Melting Thermodynamics#Simulating DNA Melting|in class]]:
 
:<math>
 
K_{eq} = e^\left [\frac{\Delta S^{\circ}}{R} - \frac{\Delta H^{\circ}}{R T} \right ]
 
</math>
 
:<math>
 
f = \frac{1 + C_T K_{eq} - \sqrt{1 + 2 C_T K_{eq}}}{C_T K_{eq}}
 
</math>
 
 
Since this is a very complicated equation and we will use it many times, it makes sense to write a Matlab function.
 
 
===dsDNA fraction function===
 
The fraction of dsDNA is a function of 4 variables: ''&Delta;S&deg;'', ''&Delta;H&deg;'', temperature, and concentration. The <tt>DnaFraction</tt> function below also takes an optional fifth argument for the gas constant so that different unit systems can be used. Using the element-by-element divide operator in the body of the function makes it possible for <tt>DnaFraction</tt> to accept either a single temperature argument or a vector of temperature values. (As a general rule, use the <tt>./</tt> operator instead of <tt>/</tt> unless you intend to do a matrix divide.)
 
 
The Matlab function <tt>nargin</tt> returns the number of arguments that the function was called with. Many Matlab programs use this function to handle variable numbers of arguments. If <tt>nargin</tt> returns 4 or less, <tt>R</tt> is set to the default value of 1.987. Otherwise, the value of <tt>R</tt> is set to the value supplied by the caller.
 
 
{{Matlab Notes}} Matlab functions must be saved in the current directory or another directory in the Matlab path. The name of the file must be the function name appended with ".m". Function help should be placed in comments at the top of the file. Type <tt>help DnaFraction</tt> to see how this works.
 
 
<pre>
 
% Returns the fraction of dsDNA in a solution containing equal concentrations
 
% of two complementary DNA oligos as a functino of total DNA concentration, 
 
% temperature, entropy change, and enthalpy change.
 
% Gas constant is an optional parameter.
 
%
 
% USAGE: f = DnaFraction(Ct, T, DeltaS, DeltaH, <GasConstant>)
 
% RETURN VALUE: fraction of dsDNA
 
%
 
% Default units are molar, Kelvin, cal/mole, and cal/(mole-Kelvin). For
 
% other unit systems, supply the appropriate value of R in the optional
 
% fifth parameter. The default value is 1.987 cal/(degree K * mole).
 
 
function f = DnaFraction(Ct, T, DeltaS, DeltaH, GasConstant)
 
% Gas constant
 
if(nargin < 5)              % determine if caller supplied R value or not
 
    R = 1.987;              % set to default: cal/(degree K * mole)
 
else
 
    R = GasConstant;        % otherwise use caller's value
 
end
 
 
% Compute Ct * Keq
 
CtKeq = Ct * exp(DeltaS / R - DeltaH ./ (R * T));
 
 
%now compute f
 
f = (1 + CtKeq - sqrt(1 + 2 * CtKeq)) ./ CtKeq;
 
 
% Written 4/9/2008 by SCW
 
</pre>
 
 
===Plot an ideal melting curve===
 
Does the <tt>DnaFraction</tt> function work as expected? A good way to test the function is to plot a melting curve generated by <tt>DnaFraction</tt>.
 
 
{{Matlab Notes}} Over the course of an interactive Matlab session, you may run several simulations with different parameters. Using a structure to keep the results of each simulation together is a good way to prevent the workspace from becoming disorganized. The <tt>struct</tt> function creates an empty structure. In the code below, a structure called <tt>ideal</tt> is created to hold the ideal melting curve data. (Since variables are dynamically allocated and typed in Matlab, the <tt>struct</tt> command is optional; however, explicitly creating the structure can make your code easier to read and better performing.)
 
 
Since '''DnaFraction''' accepts a vector temperature argument, an entire melting curve can be computed with a single call.
 
 
<pre>
 
ideal = struct();                          % create an empty structure to hold the results
 
ideal.temperature = (0:99) + 273.15;      % temperature axis in Kelvin: 100 points(0-99C)
 
ideal.deltaS = -184;                      % cal/mol
 
ideal.deltaH = -71E3;                      % cal/(mol K)
 
ideal.concentration = 1E-6;
 
 
% call DnaFraction function
 
% arguments: concentration, temperature vector, delta S, delta H
 
% (optional R argument not supplied)
 
 
ideal.dsDnaFraction = DnaFraction(ideal.concentration, ideal.temperature, ...
 
                                  ideal.deltaS, ideal.deltaH);
 
 
plot(ideal.temperature, ideal.dsDnaFraction);
 
title('Ideal DNA Melting Curve');
 
xlabel('Temperature (K)');
 
ylabel('dsDNA Fraction');
 
</pre>
 
[[Image:Ideal DNA Melting Curve.jpg|350 px|Ideal DNA melting curve]]
 
 
Experiment with different values of ''&Delta;H&deg;'' and ''&Delta;H&deg;'' to see how the curve changes. You can use [http://www.basic.northwestern.edu/biotools/oligocalc.html OligoCalc] or another tool to estimate values for different DNA sequences.
 
 
One method of estimating ''T<sub>m</sub>'' is to find the temperature where the dsDNA fraction equals 1/2. Another method is to differentiate the melting curve. ''T<sub>m</sub>'' occurs at the peak of the derivative &mdash; where the dsDNA fraction is changing the fastest. (Why might you prefer one method to the other when estimating ''T<sub>m</sub>'' from experimental data?)
 
 
===Numerical differentiation===
 
The fluorescence function ''f<sub>N</sub>'' is a sampled version of the continuous function ''f(T)'' such that ''f<sub>N</sub> = f(N&Delta;T)'', where ''&Delta;T'' is the sampling interval (1&deg; C, in this instance). One method for computing a numerical approximation to ''df/dT'' is the finite difference:
 
:<math>
 
{f'}_N = \frac{f_{N+1}-f_{N}}{\Delta T}</math>.
 
''dF/dT'' is guaranteed to take on the value given by ''f'<sub>N</sub>'' at some point between ''N&Delta;T'' and ''(N+1)&Delta;T''; however, this happy coincidence can occur at any place in the interval. A simple but effective technique is to guess that equality occurs in the middle of the interval. This guess is close enough in most cases.
 
 
The code below computes the finite difference and stores it in the <tt>dDsDnaFraction</tt> field of <tt>ideal</tt>. It also generates an associated temperature vector (<tt>ideal.dTemperature</tt>) with values centered between the original temperature samples. Note that both vectors are one element shorter than the original data.
 
 
<pre>
 
ideal.dTemperature = ideal.temperature(1:99) + 0.5; % create new temperature vector
 
ideal.dDsDnaFraction = diff(ideal.dsDnaFraction);  % take the difference
 
 
plot(ideal.dTemperature, -1 * ideal.dDsDnaFraction);
 
title('Derivative of Ideal DNA Melting Curve');
 
xlabel('Temperature (K)');
 
ylabel('Delta dsDNA Fraction');
 
</pre>
 
[[Image:Derivative of Ideal DNA Melting Curve.jpg|350 px|Derivative of ideal DNA melting curve]]
 
 
In the previous section, the plot shows that the dsDNA fraction equals 0.5 at about 332.5 degrees. The peak of the derivative also occurs at the same temperature. (The curve is plotted inverted.)
 
 
==Set up simulation conditions and data structure==
 
It will be convenient to set up a data structure with the simulation parameters and a time axis.
 
 
<pre>
 
simulationLength = 900;                              % 15 minute simulation
 
sampleRate = 1;                                      % 1 Hz sample rate
 
 
sim1 = struct();                                      % create empty struct for sim results
 
sim1.concentration = 1E-6;                            % 1 micromolar concentration
 
sim1.deltaS = -184;                                  % cal / (mole-K)
 
sim1.deltaH = -71E3;                                  % cal / mole
 
sim1.initialTemperature = 363;                        % 363 K = 90 C
 
sim1.finalTemperature = 293;                          % approx. room temperature
 
sim1.sampleRate = 1;                                  % 1 Hz sample rate
 
sim1.duration = 900;                                  % seconds -- 15 minute experiment
 
 
sim1.time = 0:(1./sampleRate):simulationLength;      % create time vector (units: seconds)
 
</pre>
 
 
==Simulate sample temperature during an experimental run==
 
The goal of the simulation is to produce two simulated voltage signals that resemble what you will measure in the lab. Since both of the voltages are functions of temperature, it makes sense to begin by modeling how the sample cools during an experimental run.
 
 
===Sample cooling===
 
During each experimental run, the range of temperatures will be provided by natural cooling of the sample after it has been heated in a bath. At the beginning of an experimental run, the DNA + fluorescent dye sample is heated in a bath to around 90&deg;C. Over a period of about 15 minutes, the sample cools to room temperature (about 20&deg;C). The rate of cooling is proportional to the difference between the sample's temperature and the room temperature, so the sample cools more quickly at the start of the experiment than near the end. (It can be frustrating to wait for the final few degrees.)
 
 
Cooling can be approximated by an exponential function with a time constant of about 3 minutes. The equation for temperature (in &deg;K) versus time is:
 
 
:<math>
 
T(t) = (T_i - T_f) e ^ {-\frac{t}{\tau}} + T_f
 
</math>
 
where,
 
:''T(t)'' is the temperature as a function of time,
 
:''T<sub>i</sub>'' is the initial temperature (after heating),
 
:''T<sub>f</sub>'' is the final (ambient) temperature,
 
:''&tau;'' is the time constant.
 
 
In Matlab:
 
 
<pre>
 
sim1.coolingConstant = 180;
 
 
% use exp function to model exponential cooling
 
sim1.temperature = ...
 
    (sim1.initialTemperature - sim1.finalTemperature) .*  ...  % T_i - T_f *
 
    exp(-sim1.time ./ sim1.coolingConstant) +  ...              % e^(-t/tau) +
 
    sim1.finalTemperature;                                      % T_f
 
 
plot(sim1.time, sim1.temperature, 'r');
 
xlabel('Time (sec)');
 
ylabel('Temperature (K)');
 
title('Simulated Temperature vs. Time');
 
</pre>
 
[[Image:DNA Sample Cooling.jpg|350 px|Simulated sample cooling curve]]
 
 
If you have any questions about the element-by-element division operator in Matlab (./) see:[[Matlab:Division operators|this page]].
 
 
==RTD temperature sensor model==
 
 
===RTD and voltage divider circuit===
 
The resistance of the RTD is given by the equation ''R<sub>RTD</sub> = 1000 + 3.85(T - 273)''. In the experimental apparatus, the RTD is hooked up as one element of a voltage divider with another resistor, ''R''. The power supply is 15 volts. Thus, the output voltage is: ''V<sub>RTD</sub> = 15 * R<sub>RTD</sub>/(R + R<sub>RTD</sub>)''. The following code calculates RTD voltage measurements from <tt>sim1.temperature</tt> and stores the result in <tt>sim1.V_RTD</tt>, assuming ''R'' = 20K&Omega;.
 
 
<pre>
 
sim1.R_RTD = 1000 + 3.85 * (sim1.temperature - 273);      % Compute RTD resistance in Ohms
 
sim1.V_RTD = 15 * sim1.R_RTD ./ (20E3 + sim1.R_RTD);      % Compute V_RTD in Volts
 
plot(sim1.time, sim1.V_RTD, 'r');
 
xlabel('Time (sec)');
 
ylabel('RTD Voltage (V)');
 
title('Simulated RTD Voltage vs. Time');
 
</pre>
 
 
[[Image:RTD Voltage vs Time.jpg|350px|Simulated RTD voltage plot]]
 
  
 
===Add some noise===
 
===Add some noise===

Revision as of 05:22, 30 August 2011

Add some noise

You will certainly notice noise on the signals you capture in the lab. (See: Where does electronic noise come from?.) Noise will be an extremely important (and annoying) factor in this experiment because the melting curve will be differentiated. Differentiation emphasizes noise. (Think about the frequency response of a differentiator.) In order to ensure that your data analysis script effectively handles the noise you are likely to observe, simulated random and impulse noise will be added to the signals.

To simulate noise, we can use Matlab's randn function, which generates normally distributed random numbers with unity standard deviation. The two arguments tell randn to create a vector of random samples with size 1xlength(V_RTD).

noise = 0.001 * randn(1,length(sim1.V_RTD));    % Create noise vector, standard deviation=1mV
sim1.V_RTD = sim1.V_RTD + noise;                % Add simulated noise to the voltage signal

plot(sim1.time, sim1.V_RTD, 'r');
xlabel('Time (sec)');
ylabel('RTD Voltage (V)');
title('Simulated RTD Voltage with Noise vs. Time');

Simulated RTD voltage plot

Model the sample fluorescence signal

Now that we have a simulated RTD voltage signal, the next step is to simulate the fluorescence signal.

The output range of the transimpedance amplifier depends on many factors, including: dsDNA quantity, illumination intensity, fluorophore efficiency, optical system gain, photodiode efficiency, and electronic amplifier gain. All of the factors except dsDNA concentration essentialy remain constant. Thus, the output of the transimpedance amplifier can be modeled by a single overall system gain factor times the dsDNA fraction. In addition, the amplifier output may be offset by a constant DC value, which is simple to model. (If the output of your apparatus drifts significantly over time, you will need to improve your implementation and methods to minimize this. The suggested design includes a potentiometer for adjusting DC offset.)

Compute dsDNA fraction (relative fluorescence) vs. time

DnaFraction computes the fraction of dsDNA from the temperature vector.

% DnaFraction arguments: concentration,temp. vector, delta S, delta H (,optional R)
sim1.dsDnaFraction = DnaFraction(sim1.concentration, sim1.temperature, sim1.deltaS, sim1.deltaH);  

plot(sim1.time,sim1.dsDnaFraction)
title('Fraction dsDNA vs. Time');
xlabel('Time (s)');
ylabel('dsDNA Fraction');

Simulated RTD voltage plot

Scale the curve and add some noise

The output range of the transimpedance amplifier over the course of an experimental run can vary depending on several factors, including the amplifier gain, optical system gain, etc... A typical setup might produce values between, say, 0.5 and 7.2 Volts. The simulation code generates two random numbers to set the range and minimum value of the data. Your data analysis script will have to normalize the amplifier output voltage to a fractional value between 0 and 1. (What assumptions will you make about dsDNA concentration to do the normalization?)

In addition to Gaussian noise, the high gain amplifier sometimes produces impulse noise. This can be caused by small static discharges, for example when a charged person sits down in a chair in the lab. It is a good idea to include some filtering for impulse noise in your data analysis script, in case your apparatus is susceptible to impulse noise.

Range = 5 + 4 * rand();                              % Range between 5 and 9
Minimum = 2 * rand() - 1;                            % Minimum between +/- 1
sim1.V_f = Range * sim1.dsDnaFraction + Minimum;     % Scale output
noise = 0.01 * randn(1,length(sim1.V_f));            % Random noise, standard deviation=10mV
sim1.V_f = sim1.V_f + noise;                         % Add random noise
noise = rand(1,length(sim1.V_f));                    % Uniformly distributed  random numbers
ImpulseNoiseProbability = 1 ./ 75;                   % Probability of impulse noise
noise = 20 * (noise < ImpulseNoiseProbability) - 10; % Quantize impulse noise
sim1.V_f = max(sim1.V_f, noise);                     % Add in impulse noise

plot(sim1.time, sim1.V_f)
title('Fluorescence Voltage vs. Time');
xlabel('Time (s)');
ylabel('Fluorescence Voltage (V)');

Simulated RTD voltage plot

Reformat the data

When you import data from the LabView program that records DNA melting data, the format will be an N x 2 matrix with $ V_T $ values in column 1 and $ V_f $ in column 2. To put the simulated data in the same format, you can use Matlab's transpose operator (.') and the horizontal concatenation function horzcat:

simData = horzcat( sim1.V_RTD .', sim1.V_f .');
size(simData)


ans =

   901     2

Please see this page if you are uncertain about the .' operator.

Congratulations. You now have a simulated dataset that should be quite similar to what you measure in the lab.

Improving the model: photobleaching (optional)

Each time a SYBR Green molecule is excited by a photon, there is a small chance that it will be destroyed and permanently lose its ability to fluoresce. The effect of photobleaching on the measured fluorescence intensity can be significant. To demonstrate this, a model for photobleaching may be added to the simulation. In the model, a small, fixed percentage of the fluorophores are mathematically destroyed each time sample. This simple difference equation can be implemented with a forend loop.

$ {\hat{f}}_{N} = B {\hat{f}}_{N-1} + f_{N} - f_{N-1} $

where,

$ {\hat{f}}_N $ is the Nth sample of the bleached relative fluorescence,
$ \left . B \right . $ is a bleaching constant equal to the fraction of fluorophore not destroyed per time sample, and
$ \left . f_N \right . $ is the Nth sample of the dsDNA fraction (relative fluorescence) curve.

The fraction of fluorophores destroyed per time sample is set by the variable bleachingConstant. In the example code, the constant is set so that half of the fluorophores will be destroyed over the course of the simulated experimental run.

sim1.bleachingConstant = (0.5)^(1/900);              % 1/2 of fluorphores fail over expmt.
sim1.diffDsDnaFraction = diff(sim1.dsDnaFraction);   % compute difference
sim1.dsDnaFractionBleached(1) = 0.05;                % set initial condition

% for loop to compute difference equation
for ii = 2:length(sim1.dsDnaFraction)
    sim1.dsDnaFractionBleached(ii) = sim1.dsDnaFractionBleached(ii - 1) * ...
          sim1.bleachingConstant + sim1.diffDsDnaFraction(ii - 1);
end

plot(sim1.time, sim1.dsDnaFractionBleached)
title('Relative Fluorescence vs. Time with Bleaching');
xlabel('Time (s)');
ylabel('Relative Fluorescence');

Fluorescence vs. time with photobleaching

The value of the first element of the vector sim1.dsDnaFractionBleached sets the initial condition for the difference equation — the relative fluorescence at the beginning of the simulation. Because SYBR green exhibits a small amount of intrinsic fluorescence when it is not bound to dsDNA, the initial value of sim1.dsDnaFractionBleached is set to a small number to model this effect.

Experiment with the bleaching constant and intrinsic fluorescence values to see how they distort the melting curve. Increasing the bleaching constant is equivalent to turning down the illumination intensity: fewer fluorphores will be destroyed in a given time. Unfortunately, the overall fluorescence intensity also decreases, necessitating higher amplifier gain and decreased signal to noise ratio. How will you decide on the optimal illumination and amplification levels in the lab? Can you use this model to correct for photobleaching effects in your data? How can you determine the bleaching constant experimentally?

Follow the steps in the previous section for scaling the relative fluorescence signal, adding noise, and reformatting.

Where to go from here

For some advice on developing your data analysis scripts, see the page on analyzing DNA melting data in Matlab.