# DNA Melting: Processing DNA Melting Data

## Overview

Fantastic. You have some simulated data or real data. It's time to estimate $ \scriptstyle \Delta S^{\circ} $, and $ \scriptstyle \Delta H^{\circ} $. Nonlinear regression is one approach to extracting these values. There are several approaches to estimating the melting temperature using both the raw data and the extracted model parameters.

Most students use Matlab to analyze their raw data. Several scholars in the past elected to use Python, instead. If you choose this path, be aware that the wiki pages for Python data analysis are significantly less developed than the corresponding Matlab cribs. Nonlinear fitting in Python has been challenging. If you decide to use Python, consult and add your thoughts to this page: DNA melting data analysis in Python.

In outline, the data analysis steps are:

- Write a function that models DNA melting
- Load data file produced by LabVIEW VI or simulation
- Provide initial guesses for model parameters and run the regression
- Normalize results and convert to correct units
- Estimate the melting temperature
- Create plots

The model function is essentially the inverse of simulating DNA melting. The arguments to the function should include a vector of all the *model parameters* and *explanatory variables*. Make the signature of the function should match the requirements of the curve fitter you plan to use, for example : lsqcurvefit in Matlab.

### Organization of data analysis script

Since you will be analyzing several experimental runs in a similar way, it makes sense to save the particulars of each experimental run (such as the name of the file that the data is stored in and the KCl concentration) in an array of data structures. Matlab implements a data type called a *cellular array*. Cellular arrays use pointy braces {} instead of parenthesis (). If `foo`

is a cellular array, then `foo{1}`

is the first element.

The following code demonstrates how to create the cellular array of data structures. The code fills in the first element of a cellular array (`DnaMatchSampleInfo`

):

DnaMatchSampleInfo = {} % null cellular array DnaMatchSampleInfo{1}.filename = 'DNA Melting Data\20bp 100mM.txt'; DnaMatchSampleInfo{1}.SampleName = '20 bp 100mM'; DnaMatchSampleInfo{1}.KclConcentration = 100; DnaMatchSampleInfo{1}.DnaConcentration = 30; DnaMatchSampleInfo{1}.FilterNormalizedCutoffFrequency = 0.1; DnaMatchSampleInfo{1}.FilterOrder = 25; DnaMatchSampleInfo{1}.NormalizationMin = 0.01; DnaMatchSampleInfo{1}.NormalizationMax = 0.99; DnaMatchSampleInfo{1}.TrimStart = 0; DnaMatchSampleInfo{1}.TrimEnd = 0; ... fill in more entries here ...

After initializing `DnaMatchSampleInfo`

, each element will contain a data structure with the parameters of a particular experimental run. You can create additional cellular arrays for different sets of experiments. For example, `DnaLengthSampleInfo`

might contain parameters for the 20, 30, and 40 base pair samples.

Using the information in the cellular arrays, it is easy to write a `for`

loop that will process all of the data for a particular set of experiments. The function (called `DnaMelt`

might look something like this:

% Function to process DNA melting data from LabVIEW VI % sampleInfo is a cellular array of data structures % output is a cellular array containing original data and computed values function out = DnaMelt(sampleInfo) % initialize variables out = {}; for ii=1:length(sampleInfo) ... process the data ... end

... process/plot specific results about whole data set...

### Loading a data file

Use the `load`

command to read in a data file.

rawData = load(sampleInfo{ii}.filename);

The first column contains RTD voltage samples and the second contains the corresponding fluorescence voltages.

### Converting voltages to temperature and relative fluorescence

These are straightforward mathematical manipluations on the dataset using the properties of the RTD and your normalized/corrected/adjusted fluorescence signal with the equation for dsDNA fraction.

### Designing an FIR low pass filter

The matlab commands `fir1`

and `freqz`

are useful for designing a low pass filter. Use `fir1`

to generate the filter kernel and `freqz`

to plot its frequency response. Increasing the order of the filter will make the transition between pass and stop bands sharper.

Try plotting a few different filter designs:

freqz(fir1(30, 0.15)) freqz(fir1(300, 0.15))

### Applying the FIR filter

Use the `conv`

command to apply the filter. But be careful how you treat the samples near the beginning and end of the signal. `conv`

returns a vector of length m + n - 1, where m and n are the lenghts of the two operands. `conv`

will assume that all values outside of the defined signal are zero. This will distort your signal near the beginning and end. You can handle this by pre-padding your data with made-up data (usually the initial and final values) or just chopping off the extra samples.

The following functions may be of use:

% usage: ConvolveAndClip(kernel, data) % convolves <kernel> with <data> and trims the ends of the result to length % length(data) - length(kernel) function out=ConvolveAndClip(kernel, data) temp = conv(kernel, data); trim = length(kernel); out = temp(trim:end-trim);

% usage: PadAndConvolve(kernel, data) % Pads <data> with initial and final values, convolves <data> with % <kernel>, and trims the result to the same length as <data> function out=PadAndConvolve(kernel, data) frontPaddingLength = floor(length(kernel)/2); rearPaddingLength = ceil(length(kernel)/2); frontPadding = data(1) * ones(frontPaddingLength, 1); rearPadding = data(end) * ones(rearPaddingLength, 1); s = size(data); if(s(1) > s(2)) paddedData = [frontPadding;data;rearPadding]; else paddedData = [frontPadding' data rearPadding']; end out = ConvolveAndClip(kernel, paddedData);

### Ensuring that F(Temperature) is a function

Because there is noise in the temperature readout, your raw data is not guaranteed to be a function. You will run into all sorts of trouble taking the finite difference later on if your data is not a function. This can be solved by either sorting the data by temperature (using Matlab's `sort`

command) or binning the samples by temperature (by iterating through the data with a `for`

loop.)

Each approach has merits and disadvantages. Sorting by temperature can result in some very small ΔT values, which tend to be very noisy. Binning has the advantage of resulting in a uniformly sampled dataset — provided that there is at least one sample in each bin.

### Fitting to Model

This topic will be discussed at length in an evening tutorial, and in person in the lab. We start with the basic technique outline in the simulations, and add additional parameters to account for fluorophore quenching with temperature, fluorophore bleaching, and temperature lab between the sample and the heating block. This year we are expanding on a multi-parameter non-linear regression technique that we first developed with the previous class.