Difference between revisions of "DNA Melting: Processing DNA Melting Data"
(→Overview) |
(→Overview) |
||
Line 5: | Line 5: | ||
==Overview== | ==Overview== | ||
− | Fantastic. Now | + | Fantastic. Now you have some [[Matlab:Simulating DNA melting|simulated data]] or [[Lab Manual:Measuring DNA Melting Curves|real data]]. It's time to estimate <math>\scriptstyle \Delta S^{\circ}</math>, <math>\scriptstyle \Delta H^{\circ}</math>, and the melting temperature. Nonlinear regression is one approach to extracting these values. |
Most students use Matlab to analyze their raw data. Several scholars in the past elected to use Python, instead. If you choose this path, be aware that the wiki pages for Python data analysis are significantly less developed than the corresponding Matlab cribs. Nonlinear fitting in Python has been challenging. If you decide to use Python, consult and add your thoughts to this page: [[Python:DNA melting data analysis |DNA melting data analysis in Python]]. | Most students use Matlab to analyze their raw data. Several scholars in the past elected to use Python, instead. If you choose this path, be aware that the wiki pages for Python data analysis are significantly less developed than the corresponding Matlab cribs. Nonlinear fitting in Python has been challenging. If you decide to use Python, consult and add your thoughts to this page: [[Python:DNA melting data analysis |DNA melting data analysis in Python]]. |
Revision as of 17:16, 28 August 2011
Overview
Fantastic. Now you have some simulated data or real data. It's time to estimate $ \scriptstyle \Delta S^{\circ} $, $ \scriptstyle \Delta H^{\circ} $, and the melting temperature. Nonlinear regression is one approach to extracting these values.
Most students use Matlab to analyze their raw data. Several scholars in the past elected to use Python, instead. If you choose this path, be aware that the wiki pages for Python data analysis are significantly less developed than the corresponding Matlab cribs. Nonlinear fitting in Python has been challenging. If you decide to use Python, consult and add your thoughts to this page: DNA melting data analysis in Python.
In outline, the data analysis steps are:
- Write a function that models DNA melting
- Load data file produced by LabVIEW VI or simulation
- Provide initial guesses for model parameters and run the regression
- Normalize results and convert to correct units
- Estimate the melting temperature
- Create plots
The model function is essentially the inverse of simulating DNA melting. The arguments to the function should include a vector of all the model parameters and explanatory variables. Make the signature of the function should match the requirements of the curve fitter you plan to use, for example : lsqcurvefit in Matlab.
Organization of data analysis script
Since you will be analyzing several experimental runs in a similar way, it makes sense to save the particulars of each experimental run (such as the name of the file that the data is stored in and the KCl concentration) in an array of data structures. Matlab implements a data type called a cellular array. Cellular arrays use pointy braces {} instead of parenthesis (). If foo
is a cellular array, then foo{1}
is the first element.
The following code demonstrates how to create the cellular array of data structures. The code fills in the first element of a cellular array (DnaMatchSampleInfo
):
DnaMatchSampleInfo = {} % null cellular array DnaMatchSampleInfo{1}.filename = 'DNA Melting Data\20bp 100mM.txt'; DnaMatchSampleInfo{1}.SampleName = '20 bp 100mM'; DnaMatchSampleInfo{1}.KclConcentration = 100; DnaMatchSampleInfo{1}.DnaConcentration = 30; DnaMatchSampleInfo{1}.FilterNormalizedCutoffFrequency = 0.1; DnaMatchSampleInfo{1}.FilterOrder = 25; DnaMatchSampleInfo{1}.NormalizationMin = 0.01; DnaMatchSampleInfo{1}.NormalizationMax = 0.99; DnaMatchSampleInfo{1}.TrimStart = 0; DnaMatchSampleInfo{1}.TrimEnd = 0; ... fill in more entries here ...
After initializing DnaMatchSampleInfo
, each element will contain a data structure with the parameters of a particular experimental run. You can create additional cellular arrays for different sets of experiments. For example, DnaLengthSampleInfo
might contain parameters for the 20, 30, and 40 base pair samples.
Using the information in the cellular arrays, it is easy to write a for
loop that will process all of the data for a particular set of experiments. The function (called DnaMelt
might look something like this:
% Function to process DNA melting data from LabVIEW VI % sampleInfo is a cellular array of data structures % output is a cellular array containing original data and computed values function out = DnaMelt(sampleInfo) % initialize variables out = {}; for ii=1:length(sampleInfo) ... process the data ... end
... process/plot specific results about whole data set...
Loading a data file
Use the load
command to read in a data file.
rawData = load(sampleInfo{ii}.filename);
The first column contains RTD voltage samples and the second contains the corresponding fluorescence voltages.
Converting voltages to temperature and relative fluorescence
These are straightforward mathematical manipluations on the dataset using the properties of the RTD and your normalized/corrected/adjusted fluorescence signal with the equation for dsDNA fraction.
Designing an FIR low pass filter
The matlab commands fir1
and freqz
are useful for designing a low pass filter. Use fir1
to generate the filter kernel and freqz
to plot its frequency response. Increasing the order of the filter will make the transition between pass and stop bands sharper.
Try plotting a few different filter designs:
freqz(fir1(30, 0.15)) freqz(fir1(300, 0.15))
Applying the FIR filter
Use the conv
command to apply the filter. But be careful how you treat the samples near the beginning and end of the signal. conv
returns a vector of length m + n - 1, where m and n are the lenghts of the two operands. conv
will assume that all values outside of the defined signal are zero. This will distort your signal near the beginning and end. You can handle this by pre-padding your data with made-up data (usually the initial and final values) or just chopping off the extra samples.
The following functions may be of use:
% usage: ConvolveAndClip(kernel, data) % convolves <kernel> with <data> and trims the ends of the result to length % length(data) - length(kernel) function out=ConvolveAndClip(kernel, data) temp = conv(kernel, data); trim = length(kernel); out = temp(trim:end-trim);
% usage: PadAndConvolve(kernel, data) % Pads <data> with initial and final values, convolves <data> with % <kernel>, and trims the result to the same length as <data> function out=PadAndConvolve(kernel, data) frontPaddingLength = floor(length(kernel)/2); rearPaddingLength = ceil(length(kernel)/2); frontPadding = data(1) * ones(frontPaddingLength, 1); rearPadding = data(end) * ones(rearPaddingLength, 1); s = size(data); if(s(1) > s(2)) paddedData = [frontPadding;data;rearPadding]; else paddedData = [frontPadding' data rearPadding']; end out = ConvolveAndClip(kernel, paddedData);
Ensuring that F(Temperature) is a function
Because there is noise in the temperature readout, your raw data is not guaranteed to be a function. You will run into all sorts of trouble taking the finite difference later on if your data is not a function. This can be solved by either sorting the data by temperature (using Matlab's sort
command) or binning the samples by temperature (by iterating through the data with a for
loop.)
Each approach has merits and disadvantages. Sorting by temperature can result in some very small ΔT values, which tend to be very noisy. Binning has the advantage of resulting in a uniformly sampled dataset — provided that there is at least one sample in each bin.
Computing the finite difference
The diff
command will compute the finite difference of a discrete-time signal, which is an approximation of a continuous derivative. Remember that you need to compute ΔFluorescence/ΔTemperature, not ΔFluorescence/ΔTime. Therefore, you must divide by ΔTemperature.
The differencing operation is particularly sensitive to noise. (What is the frequency response of a differencer?) If your derivative plots are noisy, you may be able to improve them by applying additional filtering.
One estimate of Tm is the peak value of the numerical derivative.
Fitting the model
In the "Simulating DNA Melting" tutorial for Matlab or Python, there is a function for computing the theoretical value of dsDNA concentration as a function of DNA concentration, temperature, ΔH° and ΔS°.
To perform the fit, use the matlab function from the tutorial. You can use Matlab's curve fitter (lsqcurvefit
) to estimate best-fit values for ΔH° and ΔS°.
lsqcurvefit
requires the fitting function to be implemented a particular way. You may find the following code excerpt, which declares a suitable function called myFunction
, useful. It also computes the R squared value for the fit.
% Create user function for fitting DnaConcentration = sampleInfo{ii}.DnaConcentration; myFunction = @ (x, xdata) DnaFraction(DnaConcentration, xdata, x(1), x(2)); % Fit data to model fitOptions = optimset('TolFun',1E-30,'TolX',1E-10,'MaxFunEvals',1E4,'MaxIter',1E4); [fitValues, resnorm] = lsqcurvefit(myFunction, [-150 -71E3], temperature, ... fluorescence, [-inf -inf], [inf inf], fitOptions); rSquared = 1 - resnorm / norm(fluorescence - mean(fluorescence))^2;
Plot legends in Matlab
Annoyingly, Matlab handles legends differently than other plot commands. The hold
command does not apply to the legend
command. (Direct complaints to http://www.mathworks.com/support.) The following code excerpt adds a plot legend. This code runs after the main loop. The cellular array out
contains the results from processing. This code also places an "X" at the melting temperature.
% Compute text cell array for plot legends and plot an % "X" at the estimated melting temperature for ii=1:length(out) legendText{2 * ii - 1} = sampleInfo{ii}.SampleName; legendText{2 * ii} = [sampleInfo{ii}.SampleName ' Best Fit']; figure(1) plot(out{ii}.maxDerivativeTemperature, out{ii}.fluorescence(out{ii}.maxDerivativeIndex),... 'linewidth',2,'marker','x','markersize',18,'color',out{ii}.plotColor); plot(out{ii}.maxModelDerivativeTemperature, out{ii}.fitFluorescence(... out{ii}.maxModelDerivativeIndex), 'linewidth',2,'marker','x','markersize',18,'color',... out{ii}.modelPlotColor); figure(2) plot(out{ii}.maxDerivativeTemperature, out{ii}.dFdT(out{ii}.maxDerivativeIndex),... 'linewidth',2,'marker','x','markersize',18,'color',out{ii}.plotColor); plot(out{ii}.maxModelDerivativeTemperature, out{ii}.dModelFdT(... out{ii}.maxModelDerivativeIndex), 'linewidth',2,'marker','x','markersize',18,'color',... out{ii}.modelPlotColor); end % Add the legends figure(1) legend(legendText, 'location', 'southwest'); hold off; figure(2) legend(legendText, 'location', 'southwest'); hold off;
Report Requirements
Report requirements will be posted separately.