Estimating second order system parameters from noise power spectra using nonlinear regression

From Course Wiki
Jump to: navigation, search
20.309: Biological Instrumentation and Measurement

ImageBar 774.jpg


… the safe use of regression requires a good deal of thought and a good dose of skepticism

Arnold Barnett


If your program generates an error message rather than curve fitting results, you probably won’t be able to make sense of the exact wording of the message.

—Motulsky and Shristopoulos. Fitting Models to Biological Data using Linear and Nonlinear Regression


You won't find a tool with more power per inch than a chain saw. Power like that requires extra attention to safety.

—lowes.com


Overview

In the limits of detection lab, you will use nonlinear regression to estimate the parameters of second-order system models. Nonlinear regression has a lot in common with chainsaws. In skilled hands, both nlinfit and a 9 horsepower Stihl Magnum are productive, accurate power tools that deliver sublime results with a minimum of effort[1]. But great power comes with great danger. Both of these formidable tools must be used with extraordinary care. Inattentive or unskilled operation engenders distorted, ugly, or even catastrophic outcomes.

An important difference between chainsaws and regression is that chainsaw accidents almost almost result in injuries that are graphic and obvious[2], whereas many grave regression wounds are stealthy. Bad regressions can easily masquerade as good ones — concealing their insidious flaws behind an alluring, plausible pair of overlapping curves. Because of this, it especially important to maintain a critical and alert mindset in the vicinity of regression. After every regression, do the responsible thing: spend some time ascertaining how disgraceful the sin you just committed was.

This page offers some advice that will help you retain all of your digits and limbs as you regress power spectra from the AFM and optical trap. Several flawed regressions are illustrated, followed by possible fixes. This sort of thing frequently happens in the development of a good regression: it takes a few tries to get the kinks out. Along the way, there are many signs of possible problems. Unfortunately, there are no definitive indicators of success. It can be difficult to tell when it's safe to stop.

AFM and optical trap model

The AFM and the optical trap can both be modeled by a linear system with a mass, spring, and damper in parallel, as shown in the figure below. (The elements all have the same velocity, so they are connected in parallel.) In the AFM, the restoring forceAir or water molecules collide with the cantilever or trapped particle, which drives the system with white noise.

Thermally excited AFM cantilever model.png Optical trap and AFM models.png Thermally excited Optical Trap model.png
RLC circuit model of AFM excited by collisions with air molecules. The integrator converts velocity to position.

The transfer function of the circuit model is:

$ \frac{x_{out}}{f_{in}}=\frac{\frac{1}{m}}{s^2+\frac{b}{m}s+\frac{k}{m}} $

In the time domain, random force input $ f(t) $ drives a second-order system to produce position output $ x(t) $.

AFM and optical trap signal processing.png
Time- and frequency-domain signals in the AFM. Air molecules collide with the cantilever to impart a random driving force $ f(t) $ with a white spectrum $ S_{ff}(f)=\frac{2K_BT}{\pi^2\Beta} $. The cantilever is an underdamped, second-order (mass-spring-damper) system with impulse response $ h(t) $ and transfer function $ \hat{H}(f) $. The transfer function has units of position divided by force (the inverse of a spring constant). The output power spectrum is equal to $ S_{ff} $ times the magnitude squared of $ \hat{h}(f) $.

The driving force has a constant power spectrum $ P_{ff}=\frac{2K_BT}{\pi^2\Beta} $. The output power spectrum $ P_{xx}(f) $ is equal to the input spectrum times the magnitude of the transfer function squared.

Regression concepts review

Nonlinear regression is a method for finding the relationship between a dependent variable $ y_n $ and an independent variable $ x_n $. The two quantities are related by a model function $ f(\beta, x_n) $. $ \beta $ is a vector of model parameters. The dependent variable $ y_n $ is measured in the presence of random noise, which is represented mathematically by a random variable $ \epsilon_n $. In equation form:

$ y_n=f(\beta, x_n)+\epsilon_n $.

The goal of regression is to determine a set of best-fit model parameters $ \hat{\beta} $ that match an observed data as closely as possible. Because the dependent variable includes noise, $ \beta $ cannot be determined exactly from the data. Increasing the number of observations or decreasing the magnitude of the noise tends to produce a more reliable estimate of $ \beta $.

Linear and nonlinear regression are similar in some aspects, but the two techniques have a fundamental difference. Nonlinear regression cannot be reduced to a single, deterministic formula like linear regression. Finding the optimal solution to a nonlinear regression is an iterative process. The regression begins with an initial guess. Each iteration produces a more refined estimate of $ \beta $. The process stops when no better estimate can be found (or when something bad happens ... such as the solution not converging).

Ordinary nonlinear least squares regression assumes that:

  • the independent variable is known exactly, with zero noise,
  • the error values are independent and identically distributed,
  • the distribution of the error terms has a mean value of zero,
  • the independent variable covers a range adequate to define all the model parameters, and
  • the model function exactly relates $ x $ and $ y $.

These assumptions are almost never perfectly met in practice. It is important to consider how badly the regression assumptions have been violated when assessing results.

Get ready …

You need four things to run a regression:

  1. a vector containing the values of the independent variable;
  2. a vector containing the corresponding observed values of the dependent variable;
  3. a model function; and
  4. a vector of initial guesses for the model parameters.
Power spectrum of x-axis QPD voltage. 3.2 μm silica spheres trapped with 80 mW laser power. Sampling frequency was 50 MHz.

Independent variable and observations

The second order system model of the optical trap makes predictions about the power spectrum of a thermally excited, trapped microsphere. The instruments actually record a time sequence of voltages from the QPD position detector. The first step in finding model parameters is to convert the raw data into a power spectrum. MATLAB has a function called pwelch that computes a power spectrum $ P_{xx}(f) $ and associated frequency bin vector $ f $ from a time series. Like many MATLAB functions, there are several different ways to invoke pwelch. Even judging by Mathworks' low standards for clarity and consistency, the various forms of pwelch are perplexing. Here is the command that generated the spectrum on the right:

[ Pxx, frequency ] = pwelch( OpticalTrapTimeDataX, [], [], [], 50000 );

The first argument, OpticalTrapTimeDataX is an Nx1 vector containing the QPD x-axis voltage samples. Empty values for the next three parameters (window, noverlap, and f) leave them set to their default values. The last argument is the sampling frequency, $ f_s $. We will reexamine the default parameter settings later on.

The two quantities in the regression are frequency and power spectral density. One of these has to serve as the independent variable, and the other as dependent. To make the best choice, consider the relative influence of measurement noise on each quantity. The optical trap has a very accurate and precise oscillator circuit that generates its sample clock. Because the sample frequency is so stable, measurement error in the frequency values calculated by pwelch is tiny — perhaps a few tens of a part per million. Power spectral density, on the other hand, is inherently noisy due to the random nature of the thermal driving force. You can see how wiggly the power spectrum is in the vertical direction. So the choice of independent and dependent variables is easy: the low-noise quantity, frequency, will serve as the independent variable and the noisy quantity, power spectral density, will be the dependent variable.

Model function

nlinfit requires that the regression model be expressed as a function that takes two arguments and returns a single vector of predicted values. The signature of the model function must must have the form:

[ PredictedValues ] = ModelFunction( Beta, X ).

The first argument, Beta, is a vector of model parameters. The second argument, X is a vector of independent variable values. The return value, PredictedValues, must must have the same size as X.

The MATLAB function defined below models the power spectrum of a second-order system excited by white noise. ModelParameters is a 1x3 vector that contains the model parameters in this order: undamped natural frequency $ f_0 $, damping ratio $ \zeta $, and low-frequency power density $ P_0 $.

function [ PowerDensity ] = SecondOrderSystemPowerSpectrum( ModelParameters, Frequency )
    
    undampedNaturalFrequency = 2 * pi * ModelParameters(1);
    dampingRatio = ModelParameters(2);
    lowFrequencyPowerDensity = ModelParameters(3);

    s = 2i * pi * Frequency;

    response = lowFrequencyPowerDensity * undampedNaturalFrequency.^2 ./ ...
        ( s.^2 + 2 * undampedNaturalFrequency * dampingRatio * s + undampedNaturalFrequency.^2 );
    
    PowerDensity = abs( response ).^2;

end

It's a good idea to test the model function out before you use it. The plot below shows four sets of curves generated by SecondOrderSystemPowerSpectrum. The critically damped response is flat from low frequency until approximately $ f_0 $, after which it falls with a slope of -4. The underdamped response exhibits a narrow peak near $ f_0 $ and then decreases with slope -4 at high frequencies. Both overdamped systems are flat at low frequencies. They decrease with slope -2 at intermediate frequencies from approximately $ \frac{f_0}{\zeta} $ to $ f_0 $, after which the response falls with slope -4. It's comforting to see that the curves have the expected shape.

Second Order System Response Versus Zeta.png

Initial guesses

nlinfit requires an initial value for each of the three model parameters, contained in a 1x3 vector. (nlinfit infers the number of model parameters from the size of the Initial guess vector.) $ P_0 $ is the easiest of the three parameters to estimate. Just take a look at the low-frequency intercept of the PSD plot. The plot hits the axis at almost exactly $ 10^{-5}\frac{\text{AU}^2}{\text{Hz}} $.

It is probably easiest to estimate the damping ratio by plugging round values into the equation for $ \zeta $:

$ \zeta = \frac{3 \pi \eta D}{\sqrt{k m}} $.

Assuming a bead diameter of 3.2 μm, a solvent viscosity of 10-3 Pascal seconds, and a material density of 2.5 grams per cc, ζ is on the order of 1.5 x 105.

A guess for $ f_0 $ can be found by noting that the knee in the curve occurs at about 200 Hz. The knee occurs at $ \zeta f_0 $, so the initial guess is $ f_0 = 1.5 \times 10^5 200 \text{Hz} = 3 \times 10^7 \text{Hz} $ The initial guess vector looks like this:

initialGuesses = [ 3e7, 1.5e5, 1e-5 ];

Get set …

The first step of all regressions is to plot the observations and the model function evaluated with the initial guesses versus the independent variable on a single set of axes. Don't attempt to run nlinfit until you've done this plot. It is much easier to ensure that the arguments to nlinfit are plausible before you invoke it than to debug a screen full of cryptic, red text afterwards. Side effects of premature regression include confusion, waste of time, fatigue, irritability, alopecia, and feelings of frustration. Contact your professor if your regression lasts more than four hours. There is no chance that nlinfit will succeed if there is a problem with one of its arguments.

Go ahead ... do the plot.

Data And Model with Initial Guesses.png

Now we can proceed with confidence that the arguments to nlinfit are credible.

Go ... first attempt

The plot below shows the results of running the regression using the data, model, and initial guesses developed so far. The green line on the plot shows the model function evaluated with best-fit parameters $ \hat{\beta} $. The best-fit model nicely overlaps the measured data.

BadRegressionNumber1.png

Success! Great curves!

Not so fast. If you let the alluring, overlapping blue and green curves in the plot above get you excited, go back and reread this page from the beginning. Much more scrutiny is required. On top of the usual reasons to be skeptical, an additional cause for concern materialized in the console output window. MATLAB barfed up a long, impenetrable, word salad of foreboding red verbiage. Running the regression triggered two different warnings.

The first warning repeated itself about a dozen times, each time with a slightly different value of "tol":

Warning: Rank deficient, rank = 2, tol =  1.099158e-10. 
> In nlinfit>LMfit (line 574)
  In nlinfit (line 276) 

Then the regression finished with a second distress signal:

Warning: Some columns of the Jacobian are effectively zero at the solution, indicating that 
the model is insensitive to some of its parameters.  That may be because those parameters 
are not present in the model, or otherwise do not affect the predicted values.  It may also be 
due to numerical underflow in the model function, which can sometimes be avoided by choosing 
better initial parameter values, or by rescaling or recentering.  Parameter estimates may be 
unreliable. 
> In nlinfit (line 373)
William Shatner in Nick of Time.

The first warning is terse and cryptic. The second one is spectacularly ambiguous. It's impossible to know how to react. Deep concern? Annoyance? Fear? Rage? Apathy? (It's just a warning, after all.) The parameter estimates MAY be unreliable. But then again they MAY not. A more forthright author could have written, "This regression smells fishy. Almost anything could have caused the problem. Beware."

So is this regression a good one or a bad one? Red text spewing from nlinfit is never good news, but It is sometimes not bad news. How can you tell? When it comes to error messages and warnings, nlinfit works just like the Mystic Seer that featured prominently in the unforgettable 43rd Twilight Zone episode Nick of Time starring William Shatner as a superstitious newlywed with car trouble stranded in a small town and Patricia Breslin as his beautiful and sensible bride[3]. Over-interpreting the willfully vague pronouncements of a mechanical contrivance is a path to subjugation and doom. The real enemy is within.

Or is there perhaps a greater force at play? It's best not to dwell on this question for too long. Get out of town while you still can.

Assessing the regression results 1: residuals

With some effort, It's possible to decode the warning messages generated by the previous regression. But even if you understand the warnings perfectly, there is still a long list of underlying problems that might have led to either of the warnings. nlinfit can detect concerning symptoms, but it doesn't have the ability to suss their implications. So the warning messages tend to either avoid the issue (as in the first warning) or produce a laundry list of possibilities (as in the second warning). This is probably why nlinfit and the Mystic Seer seem to have so much in common.

As a first step, it's frequently more productive to revisit the construction of the regression than to try to decipher the warning messages. Violations of the regression assumptions are a leading cause of trouble. One of the most fundamental assumptions in the regression model is that the error terms are independent and identically distributed. An easy way to gauge the validity of this assumption is to plot the vector of residuals versus the independent variable and have a look. (The second value returned from nlinfit contains a vector of the residuals.) If the regression assumptions are perfectly met, the plot should look something like the plot on the left below. The plot below on the right shows the actual residuals from this regression.

IdealRegressionResiduals.png BadRegression1Residuals.png

Holey moley! The right plot does not look at all like the left one. The magnitude of the residuals decreases dramatically as frequency increases — a significant and unacceptable violation of the regression assumptions. We definitely have a bad regression on our hands.

This has to be fixed before moving on.

Second attempt

In retrospect, the problem with the residuals in the first regression should have been obvious. A quick glance at the PSD plot reveals that the measurement error (as estimated by the fuzziness of the line) is about the same size over the entire range of the plot. But the axes are scaled logarithmically, so a certain level of fuzziness near the top of the plot represents a much larger measurement error than the same amount of fuzziness near the bottom of the plot. The errors are nowhere near identically distributed, so there wan't any hope of getting a good residual plot. It turns out that the measurement error in the PSD is roughly proportional to the magnitude squared of the instrument's transfer function.

Why does the measurement error in the PSD scale that way? Handwaving alert: the following explanation is not particularly rigorous. At a fundamental level, measuring the power of a signal in a certain frequency band is similar to measuring the average weight of a grain of sand on all the beaches of the world. It's not practical to measure every single grain of sand. The next best thing is to measure a random sample and use the mean weight of the sample as a proxy for the true answer. Unfortunately, no matter how carefully you choose the subpopulation of sand grains to measure, it will not be perfectly representative. Just by luck of the draw, the mean weight of a particular random sample will differ from the true mean weight of all grains of sand.

Signals extend over an infinite time period, so It's even less practical to measure the frequency spectrum of an entire signal than weigh all the grains of sand. It's possible to estimate the power spectrum of a signal using a similar approach, though: measure a representative portion of the signal and use that as a proxy. For a power spectrum, the representative sample is just a finite time interval. Just by chance, the representative portion of the signal will have a little more power than usual at some frequencies and a little less at others during the window of time you measure. This manifests as uncertainty or variation in the PSD measurement. For example, if you could somehow measure the spectrum of white noise over an infinite time period, it would be perfectly flat line — exactly equal magnitude at all frequencies. The PSD of a white noise signal measured over a finite time interval looks like a fuzzy, horizontal line.The details of the fuzziness in the finite-time power spectrum estimate are different every time you repeat the measurement.

The model of the optical trap is a second-order system driven by white (thermal) noise. The output of the position detector is measured over a finite time window. Over the measurement window, the input signal's power spectrum is not perfectly representative — it's a fuzzy line. The output PSD is equal to the input PSD multiplied by the magnitude squared of the transfer function. So both the average value and the fuzziness in the input power spectrum get scaled. Assuming the uncertainty in the input signal's power spectrum has the same distribution at all frequencies, the measurement error in the output power spectrum is proportional to the magnitude squared of the transfer function. (If you are interested in a more mathematically precise argument, this presentation is a good starting point)

Fortunately, there are a few easy ways to fix the problem. A good model for the multiplicative noise in the power spectrum is:

$ y_n = f(\beta,x_n) + \epsilon_n f(\beta,x_n) = f(\beta,x_n) ( 1 + \epsilon_n ) $

One approach to fixing the problem with the residual distribution is to transform the dataset into log space[4]:

$ \log{y_n}=\log{f(\beta,x_n) ( 1 + \epsilon_n )}=\log{f(\beta,x_n)}+log{(1+\epsilon_n )} $

Taking the log of the dependent variable turns multiplicative noise into additive noise. The code below uses MATLAB's anonymous function syntax to create a log-space version of the model function.

logModelFunction = @( p, f ) log( SecondOrderSystemPowerSpectrum( p, f ) );

Simple enough. Now call nlinfit with logModelFunction instead of @ModelFunction. Remember to also take the log of $ P_{xx} $.

Here are the results and the new residual plot:

BadRegression2.png BadRegression2Residuals.png

The estimate of $ f_0 $ changed by more than 250 Hz, and the residual plot is much closer to the ideal. Progress.

The distribution of the residuals looks a great deal more like the ideal plot, but MATLAB is still belching quite a bit of red ink — pretty much the same warnings as before.

In general, the less red text produced by nlinfit the better.

What about all those error messages?

Now that the residuals look normal (ha ha), it's time to embark on the arduous journey of deciphering all that red ink in the console window. The first message was a warning from deep inside nlinfit about rank deficiency that kept repeating during iterations of the regression. If you take a look at the line of code that generates the warning (by clicking on the link in the warning), you can see that the message originates from the MATLAB function mldivide. A practical definition of rank deficiency in the context of regression is: for some reason there is not enough information in the dataset to estimate the model parameters. The manifestation of rank deficiency is that some of the outputs of mldivide are essentially zero. (That is, their magnitudes are less than tol). The warning goes on to specify that rank = 2 and tol = 1.469328e-05.

If you don't understand the rank deficiency warning, you are in much company. A quick Google search of ‘MATLAB nlinfit rank deficient warning’ yields about 500 results. Many of these are posts to various nerdy Internet forums asking, ‘what does this warning message mean?’ Collective Wisdom has not provided many useful answers to most of these queries. The wording of the rank deficiency error message sometimes motivates the uninitiated to muck around with nlinfit's tol parameter, since the error message mentions it. Resist you urge.

Many different underlying issues can lead to rank deficiency, including problems with the model, the dataset, or both. This page has a good, practical discussion of some of them. With the benefit of hindsight, it is not too hard to figure out why there isn't enough information for nlinfit to go on. The rank = 2 part of the message means that two of the three model parameters seem okay, and one of them is unsettlingly close to zero. If you had to do this regression by hand, it would be pretty easy to pick off $ P_0 $ based on the low-frequency data. The other two model parameters specify two inflection points in the curve — one at $ \frac{f_o}{\zeta} $ and another at $ f_0 $. The first inflection point at occurs at around 100 Hz. Easy. But our back-of-the-envelope estimate of the second inflection point was $ 3 \times 10^7 \text{Hz} $. The range of the dataset is $ 0 - 5 \times 10^3 \text{Hz} $., so there isn't much hope of getting a good estimate of $ f_0 $ from the dataset.

The fix for this problem is to remove the second inflection point from the model. In a highly overdamped system, $ m \rightarrow 0 $, and the transfer function simplifies to:

$ \frac{x_{out}}{f_{in}}=\frac{1}{bs+k}=\frac{P_0}{\frac{f}{f_0}+1} $

The new model function has two parameters: $ P_0 $ and $ f_0 $. It's a simple enough to use an inline declaration:

overdampedLogModelFunction = @(p, f) log( ( ( p(2) * 1e-6 ) ./ ( f.^2 / p(1)^2 + 1 ) ).^2 ); overdampedLogModelFunction = @( p, f ) log( ( p(2) ./ ( f.^2 + p(1).^2 ) ).^2 )

Third attempt

Here are the results using overdampedLogModelFunction with initial guesses of [ 500, 1e-5 * 500^2 ].

380 px 380 px

The regression completes with no error messages or warnings. Things are looking up.

Even though there was no red ink, t's a good idea to have a look at the parameter confidence intervals. The confidence intervals measure uncertainty in the best-fit parameter values, $ \hat{\beta} $. Conceptually, you can think of a 95% parameter confidence interval like this: Imagine that you repeated the measurement procedure precisely the same way 100 times. Because of random noise alone, the values of $ \hat{\beta} $ would be a little different for each trial (assuming all the sources of systematic error were also identical). Chances are that $ \hat{\beta} $ would fall within the confidence interval about 95 times of of the 100 trials. Large parameter confidence intervals can be a sign of trouble. Small intervals are a good sign, but they do not a guarantee that a regression was correct. Freakin' regression. Confidence intervals are computed for a given level of uncertainty, usually expressed as a percentage. The default confidence level is 95% if you use the MATLAB function nlparci to compute parameter confidence intervals.

Computing the parameter confidence intervals is easy in MATLAB:

[bestFit, residuals, jacobian,covariance,mse] = ...
    nlinfit( f, log( Pxx ), logModelFunction, initialGuesses );
parameterConfidenceIntervals = nlparci( bestFit, residuals, 'covar', covariance );

Here are the 95% confidence intervals for the current regression returned by nlparci.

Parameter Lower Bound Upper Bound Units
$ f_0 $ 223.0951 236.1620 Hz
$ P_0 $ 8.89 9.70 $ \frac{\text{V}^2}{\text{Hz}} $

So is this good or bad news? The confidence intervals are not immediately unnerving. (Confidence intervals that include zero are worth looking into.) It's interesting that the interval for $ f_0 $ is unbelievably small and the ones for $ \zeta $ and $ P_0 $ are much larger.

Third attempt

The warning about the ill-conditioned Jacobian implies that a small change in the observations could bring about a large change in $ \hat{\beta} $, which is not too comforting. There are two knees in the second-order, overdamped PSD curve (that is, two places where the slope changes). At large damping ratios, the second knee occurs at a much higher frequency than the first knee. The first knee is around 100 Hz, so that puts the second knee at 10 MHz, which is very far above the highest frequency in the PSD, 25 KHz. Large changes in the damping ratio make only inconsequential changes in the low-frequency power spectrum. As such, the range of frequency values measured is not large enough to assure a good value for $ \zeta $

Comparing the three regressions

  1. See, for example, [1] or [2]
  2. Do not follow this link if you are at all squeamish: [3]
  3. If you haven't seen this sublime 25 minutes of televised perfection, go watch it right now. I will give a 24 hour extension to anybody who writes a 1-5 page essay on the episode.
  4. Two other reasonable approaches to fixing the problem are: change the value of the ErrorModel parameter to nlinfit, or use the Weights option.