Difference between revisions of "Estimating second order system parameters from noise power spectra using nonlinear regression"

From Course Wiki
Jump to: navigation, search
(Independent variable and observations)
(Independent variable and observations)
Line 47: Line 47:
 
[[File:Optical Trap Power Spectrum.png|thumb|right|350 px|Power spectrum of x-axis QPD voltage. 3.2 μm silica spheres trapped with 80 mW laser power. Sampling frequency was 50 MHz.]]
 
[[File:Optical Trap Power Spectrum.png|thumb|right|350 px|Power spectrum of x-axis QPD voltage. 3.2 μm silica spheres trapped with 80 mW laser power. Sampling frequency was 50 MHz.]]
 
===Independent variable and observations===
 
===Independent variable and observations===
The second order system model of the optical trap makes predictions about the power spectrum of a thermally excited, trapped microsphere. The optical trap instrument records a time sequence of voltages related to the position of the trapped microsphere. This is not a power spectrum, so it is necessary to convert the raw data to a power spectrum for fitting.  
+
The second order system model of the optical trap makes predictions about the power spectrum of a thermally excited, trapped microsphere. The optical trap instrument records a time sequence of voltages related to the position of the trapped microsphere. It is necessary to convert the raw data to a power spectrum for fitting.  
  
Frequency is known precisely, so it will serve as the independent variable in the regression. Power is the (noisy) dependent variable.
+
The MATLAB function <tt>pwelch</tt> computes a power spectrum <math>P_{xx}(f)</math> and associated frequency bin vector <math>f</math> from a time series. Like many MATLAB functions, there are several different ways to invoke <tt>pwelch</tt>.Even judging by Mathworks' low standards for clarity and consistency, the signatures of <tt>pwelch</tt> are particularly opaque. Here is the command that generated the spectrum on the right:  
 
+
The MATLAB function <tt>pwelch</tt> computes a power spectrum <math>P_{xx}(f)</math> and associated frequency bin vector <math>f</math> from a time series. Like many MATLAB functions, there are several different ways to invoke <tt>pwelch</tt>.The signatures of <tt>pwelch</tt> are particularly opaque &mdash; even by Mathworks' standards. Here is the command that generated the spectrum on the right:  
+
  
 
[ Pxx, frequency ] = pwelch( OpticalTrapTimeDataX, [], [], [], 50000 );
 
[ Pxx, frequency ] = pwelch( OpticalTrapTimeDataX, [], [], [], 50000 );
  
 
The first argument, <tt>OpticalTrapTimeDataX</tt> is an Nx1 vector containing the QPD x-axis voltage samples. Empty values for the next three parameters (<tt>window</tt>, <tt>noverlap</tt>, and <tt>f</tt>) leave them set to their default values. The last argument to <tt>pwelch</tt> is the sampling frequency, <math>f_s</math>. We will reexamine the default parameter settings later on.
 
The first argument, <tt>OpticalTrapTimeDataX</tt> is an Nx1 vector containing the QPD x-axis voltage samples. Empty values for the next three parameters (<tt>window</tt>, <tt>noverlap</tt>, and <tt>f</tt>) leave them set to their default values. The last argument to <tt>pwelch</tt> is the sampling frequency, <math>f_s</math>. We will reexamine the default parameter settings later on.
 +
 +
In order to make the best choice of independent and dependent variables for the regression, it is necessary to consider the noise in each of the quantities, <math>f</math> and <math>P_xx</math>. The sample clock of the optical trap is generated by an oscillator that provides an accurate and precise signal. The very high quality digitization clock means that frequencies in the QPD signal can be computed with essentially no error.  Power density, on the other hand, is a noisy measurement due to constant perturbation of the trapped particle's position by the random thermal driving force. Therefore, frequency will serve as the independent variable in the regression.
  
 
===Model function===
 
===Model function===

Revision as of 22:24, 25 May 2015

20.309: Biological Instrumentation and Measurement

ImageBar 774.jpg

ChainsawartinBrienz

… the safe use of regression requires a good deal of thought and a good dose of skepticism

Arnold Barnett


Overview

In the limits of detection lab, you will use regression to estimate the parameters of second order system models of the optical trap and atomic force microscope (AFM). MATLAB includes a powerful nonlinear regression function called nlinfit. Like all sharp tools, nlinfit must be used with extreme care. Nonlinear regression is like a chainsaw. In the right hands, it is a powerful tool that produces breathtaking and sublime results[1]. Inattentive or unskilled operation, on the other hand, usually engenders distorted, ugly, or even catastrophic[2] outcomes.

Many common regression missteps spawn subtle errors instead of disasters. Because of this, bad regressions easily masquerade as good ones — hiding their insidious flaws behind a lovely, plausible pair of overlapping curves. This page illustrates several flawed regressions with the hope of helping you navigate the concealed hazards of fitting power spectra.

Remain vigilant at all times when you regress.

Regression review

Nonlinear regression is a method for finding the relationship between a dependent variable $ y_n $ and an independent variable $ x_n $. The two quantities are related by a model function $ f(\beta, x_n) $. $ \beta $ is a vector of model parameters. $ y_n $ is measured in the presence of random noise, which is represented mathematically by a random variable $ \epsilon_n $. In equation form:

$ y_n=f(\beta, x_n)+\epsilon_n $.

The goal of regression is to find the set of model parameters $ \hat{\beta} $ that best matches the observed data.

Ordinary nonlinear least squares regression assumes that:

  • the independent variable is known exactly, with zero noise,
  • the distribution of the error terms are independent and all drawn from the same distribution (i.i.d),
  • the distribution of the error terms has a mean value of zero,
  • the independent variable covers a range adequate to define all the model parameters, and
  • the model function exactly relates $ x $ and $ y $.

These assumptions are almost never perfectly met in practice. It is important to consider how badly these assumptions have been violated when assessing regression results.

Get ready …

You need four things to run a regression:

  1. a vector containing the values of the independent variable;
  2. a vector containing the corresponding observed values of the dependent variable;
  3. a model function; and
  4. a vector of initial guesses for the model parameters.
Power spectrum of x-axis QPD voltage. 3.2 μm silica spheres trapped with 80 mW laser power. Sampling frequency was 50 MHz.

Independent variable and observations

The second order system model of the optical trap makes predictions about the power spectrum of a thermally excited, trapped microsphere. The optical trap instrument records a time sequence of voltages related to the position of the trapped microsphere. It is necessary to convert the raw data to a power spectrum for fitting.

The MATLAB function pwelch computes a power spectrum $ P_{xx}(f) $ and associated frequency bin vector $ f $ from a time series. Like many MATLAB functions, there are several different ways to invoke pwelch.Even judging by Mathworks' low standards for clarity and consistency, the signatures of pwelch are particularly opaque. Here is the command that generated the spectrum on the right:

[ Pxx, frequency ] = pwelch( OpticalTrapTimeDataX, [], [], [], 50000 );

The first argument, OpticalTrapTimeDataX is an Nx1 vector containing the QPD x-axis voltage samples. Empty values for the next three parameters (window, noverlap, and f) leave them set to their default values. The last argument to pwelch is the sampling frequency, $ f_s $. We will reexamine the default parameter settings later on.

In order to make the best choice of independent and dependent variables for the regression, it is necessary to consider the noise in each of the quantities, $ f $ and $ P_xx $. The sample clock of the optical trap is generated by an oscillator that provides an accurate and precise signal. The very high quality digitization clock means that frequencies in the QPD signal can be computed with essentially no error. Power density, on the other hand, is a noisy measurement due to constant perturbation of the trapped particle's position by the random thermal driving force. Therefore, frequency will serve as the independent variable in the regression.

Model function

nlinfit requires that the regression model be expressed as a function that takes two arguments and returns a single vector of predicted values. The signature of the model function must must have the form:

[ PredictedValues ] = ModelFunction( Beta, X ).

The first argument is a vector of model parameters. The second argument is a vector of independent variable values. The return value must must have the same size as the second argument.

The MATLAB function below models the power spectrum of a second-order system excited by white noise. The three model parameters are passed to the function in a 1x3 vector: undamped natural frequency $ \omega_0 $, damping ratio $ \zeta $, and low-frequency power density $ P_0 $.

function [ PowerDensity ] = SecondOrderSystemPowerSpectrum( ModelParameters, Frequency )
    
    undampedNaturalFrequency = ModelParameters(1);
    dampingRatio = ModelParameters(2);
    lowFrequencyPowerDensity = ModelParameters(3);

    s = 2i * pi * Frequency;

    response = lowFrequencyPowerDensity * undampedNaturalFrequency.^2 ./ ...
        ( s.^2 + 2 * undampedNaturalFrequency * dampingRatio * s + undampedNaturalFrequency.^2 );
    
    PowerDensity = abs( response ).^2;

end

Initial guesses

nlinfit infers the dimensionality of the regression (i.e. the number of model parameters) from the size of the Initial guess vector. Accordingly, the initial guesses for the model parameters should be stored in a 1x3 vector. A reasonable place to start is: 500 Hz for $ \omega_0 $, the average power density of the first 20 frequency bins for $ P_0 $, and 1 for the damping ratio.

Get set …

Observed data and model function with initial guesses.

The first step of any regression is to plot the data and model function evaluated with the initial guesses on the same set of axes. The importance of this step cannot be overstated. Consequences of skipping this step can be tragic. Careless individuals have been known to suffer confusion, waste of time, fatigue, irritability, alopecia, and feelings of frustration. Debugging regression problems is tricky. Taking the time to make this plot will help catch problems before they occur by ensuring that all the inputs to nlinfit are correct.

Go ahead ... do the plot.

Go ... first attempt

Range of independent variable

Second Order System Response Versus Zeta.png
Power spectra of critically damped, underdamped, and overdamped second order systems excited by white noise. The underdamped response is flat at low frequencies, exhibits a narrow peak near $ f_0 $, and decreases with slope -4 at high frequencies. The critically damped response is flat from low frequency until approximately $ f_0 $, after which it falls with a slope of -4. Both overdamped systems are flat at low frequencies. They decrease with slope -2 at intermediate frequencies from approximately $ \frac{f_0}{\zeta} $ to $ f_0 $, after which the response decreases with slope -4.
  1. See, for example, [1] or [2]
  2. Do not follow this link if you are at all squeamish: [3]