Difference between revisions of "Assignment 2 Part 1: Noise in images"
(→Poisson distribution) |
(→Assessing the regression results) |
||
Line 233: | Line 233: | ||
% compute best fit linear model with weights | % compute best fit linear model with weights | ||
figure( mainFigureHandle ) | figure( mainFigureHandle ) | ||
− | fitModel = fitlm( pixelMeans, | + | fitModel = fitlm( pixelMeans, PixelVariances, 'poly1', 'Weights', 1 ./ PixelVariances ); |
xAxis = logspace( -1, log10( 4095 ), 256 )'; | xAxis = logspace( -1, log10( 4095 ), 256 )'; | ||
loglog( xAxis, fitModel.predict( xAxis ), 'LineWidth', 3 ); | loglog( xAxis, fitModel.predict( xAxis ), 'LineWidth', 3 ); | ||
Line 239: | Line 239: | ||
figure | figure | ||
− | weightedResiduals = fitModel.Residuals.Raw(:) ./ | + | weightedResiduals = fitModel.Residuals.Raw(:) ./ PixelVariances; |
plot( pixelMeans, weightedResiduals, 'x' ); | plot( pixelMeans, weightedResiduals, 'x' ); | ||
title( 'Weighted Residuals' ) | title( 'Weighted Residuals' ) |
Revision as of 21:27, 14 February 2019
Overview
Recording a digital image is essentially an exercise in measuring the intensity of light at numerous points on a grid. A simple mental model of a digital image sensor consists of an array of photodiodes with a bucket underneath each diode. Photons strike the photodiodes and give rise to free photoelectrons. The photoelectrons fall into the buckets underneath the photodiodes. At the end of an exposure, the number of electrons in each bucket equals to the number of photons that struck the photodiode above it. But the total charge in each bucket is tiny, so image sensors typically include electronic amplifiers to multiply the charge in each bucket up to a level where it can be measured. The details of the amplifier depend on the type of image sensor. Mathematically, the amplifier can be modeled as a multiplicative constant. The amplified charge in each bucket is converted to a binary integer by an analog-to-digital converter. After all the buckets are measured, the camera returns a matrix of binary pixel values to the computer for recording, manipulation, and display:
- $ P_{x,y}=G N_{photo} $,
where $ P $ is the pixel value, $ x $ and $ y $ are the position-dependent matrix indexes, $ G $ is the camera's gain (which can be changed be software configuration on many camera models), and $ N_{photo} $ is the number of photoelectrons in the bucket. The (uncalibrated) units of $ P $ are sometimes written as ADU for "Analog-to-Digital Units" or DN for "Digital Number". $ G $ as defined here has units of ADU per electron or DN per electron. Dividing a pixel value divided by the gain gives the number of electrons in the bucket. (Note that many camera data sheets specify the inverse of the quantity defined here with units of electrons per ADU or DN.)
If only that was the end of the story.
Like most physical measurements, quantifying light intensity is subject to various sources of observational error. The amount of information in any signal, whether it is as an image or a neural recording or a stock price, depends on the Signal to Noise Ratio (SNR). Because microscopic images are frequently dim, minimizing noise is a key challenge in many applications. In this part of the assignment, you will use MATLAB or another computing environment of your choice to simulate the major sources of error in images and then you will measure the characteristics of the Manta G-040 cameras we use in the lab.
If you are a bit rusty on probability concepts, review this page before you go on.
Noise sources in images
Three of the most significant error sources that affect digital images are: shot noise, dark current noise, and read noise. Simple, probabilistic models can approximate the characteristics of each of the three significant noise sources.
Shot noise
Shot noise Is a consequence of the discrete nature of light and the inherent variability of photon emission. A useful model of shot noise is to imagine that an unchanging luminous source that consists of a gigantic number, $ N $, of photon emitters (such as excited molecules). Each of the emitters is identical to and independent of the others. Over the course of a finite measurement interval, there is a small probability $ p $ that each emitter will release a photon that eventually reaches a light detector. (Photodiodes are one example of a light detector, but the properties of shot noise do not depend at all on the detector's implementation.) In this situation, the detector will count an average of $ \bar{N} = N p $ photons. But because all of the emitters are independent, just due to chance, the number of photons detected during a specific measurement interval varies.
Shot noise simulation
It's pretty easy to simulate this model of shot noise on a computer. The first thing you need is a source of random numbers. So fire up MATLAB or any other software environment you like (and stop whining about it). Or get out a million or so two-sided dice.
MATLAB includes several functions for generating random numbers. The function rand() returns a random value that follows a uniform distribution in the interval (0, 1). Go ahead and type rand() at the command line. Matlab returns a pseudorandom number between 0 and 1. Do this a few times — it's kind of fun. You get a different number every single time. Can you guess what number will come next? If you can, please come see me for a special assignment.
rand cab also produce a vector or matrix of random numbers. Call the function with two integer arguments M and N, and rand will return M rows by N columns of uniformly distributed random numbers between 0 and 1. For example, the code snippet below to generates and displays (as an image) a 544 row x 728 column matrix of numbers with uniform distribution on the interval [0,1]:
noiseImage = rand( 544, 728 ); figure imshow( noiseImage );
Okay, enough goofing around with rand. Assume that $ N=10^6 $ emitters and $ p=0.01% $ chance of detection for each emitter per exposure. For the time being, also assume the detector is perfect so that it registers every single photon that hits it. To model photon detection over a single exposure, you a statement like this: sum( rand( 1, 1E6 ) < 0.0001 ). This piece of code generates a vector of 1 million random numbers between 0 and 1. The less than operator returns a vector of logicals (true or false), with a 1 or true corresponding all the values in the random vector smaller than 10-4 and 0 or false for the rest. There is a 0.01% chance of getting a 1 at any given position in the vector. The sum function adds up all of the ones and zeros and returns a count of the total number of simulated photons emitted. On average, you would expect to detect about 100 photons during a one second long measurement.
Copy-paste this code into the MATLAB command window to run the simulation one thousand times and plot the result of each simulation.
close all figure probabilityOfEmission = 1E-4; numberOfPhotonsEmitted = nan( 1, 1000 ); % allocate space for the results for ii = 1:1000 numberOfPhotonsEmitted(ii) = sum( ( rand( 1, 1E6 ) < probabilityOfEmission ) ); plot( numberOfPhotonsEmitted, 'x' ); axis( [ 1 1000 0 140 ] ); xlabel( 'n' ); ylabel( 'Number of Photons' ); title( [ 'Photon Count (N=' num2str(ii) ... ' siumulations) \mu=' num2str( mean( numberOfPhotonsEmitted(1:ii) ) ) ... ' \sigma=' num2str( std( numberOfPhotonsEmitted(1:ii) ) )] ); drawnow end
Each "x" on the plot represents the results on one simulation.
|
As demonstrated in the plot, the number of photons emitted varies around the average value. Even though the average number of photons detected is constant, the number actually measured is most often not equal to the average. Because of the stochastic properties of photon emission, it is impossible to measure light intensity with complete confidence in a finite amount of time. There is no way to know whether there were fewer or more photons than usual during the interval of the measurement. Shot noise is fundamental to the imaging process. An ideal optical detector is said to be "shot-noise limited".
The binomial distribution is a probabilistic model that matches this situation, where there are a certain number of "trials" and each trial has a certain probability of "success". The binomial distribution has two parameters: $ N $ for the number of "trials", and $ p $ for the likelihood of "success". The binomial distribution assumes that each trial is identical to and independent of the others. One example of the binomial distribution that you have likely seen before is the case of flipping a fair coin, say, 100 times. On each toss, the probability of success (heads) is $ p=0.5 $. If $ N=100 $ and $ p=0.5 $, the average number of heads you expect is 50. Of course, most of the time you don;t get exactly 50 heads. The binomial distribution answers the question: "How surprised should I be if I get 12 heads?"[1] The probability of getting exactly k successes in N trials is:
- $ Pr(k;n,p) = \Pr(x = k) = {n\choose k}p^k(1-p)^{n-k} $
for k = 0, 1, 2, ..., n, where
- $ \binom n k =\frac{n!}{k!(n-k)!} $
Using this formula, you could compute the standard deviation of a binomial distribution. Happily, somebody already did the math and you can just look up the result on Wikipedia:
- $ \sigma_x^2=Np(1-p) $
The binomial distribution is discrete — only certain results are possible (integers between 0 and N). Nonetheless, it is possible to approximate the binomial distribution with a continuous normal distribution. The approximation is better for large values of $ N $ and values of $ p $ near 0.5. Even in cases where the normal approximation is not very good, it can still give a general idea of how far results are likely to be from the mean.
Shot noise places a fundamental limit on the precision of light measurements. Even if you do everything exactly right, the signal to noise ratio of is
- $ \textrm{SNR}=\frac{N}{\sigma_N}=\frac{N}{\sqrt{N}}=\sqrt{N} $
An ideal detector is said to be shot noise limited and achieves an SNR of $ \sqrt{N} $
Poisson distribution
In the special case where the value of $ p $ is small, the variance of a binomial distribution is well approximated by $ \sigma_x^2=Np $. You will recognize $ Np $ as the average value of the distribution. When $ p $ is small, the variance of the distribution is approximately equal to the mean. This special case of the binomial distribution is called the Poisson distribution, and it is incredibly useful in cases where you don't know the values of $ N $ and $ p $. For example, you would almost never know $ N $ and $ p $ for a particular light source. But you can still make a very good guess about the shot noise as long as you know the average number of photons detected.The Poisson distribution is a useful model for photon emission, radioactive decay, the number of letters you receive in the mail, the number of page requests on a certain website, the number of people who get in line to check out at a store, and many other everyday situations.
Dark current
Dark current arises because the physical detector in a camera exists at a nonzero temperature. Electrons in the detector don't have equal energies. At normal temperatures, most of the electrons are at or near ground state. But just by chance, a few electrons have enough energy to roll out of the photodiode and into the bucket even though they were not excited by a photon. The camera counts these "dark electrons" just the same as the photoelectrons that were excited by photons. Even in total darkness the electron count is not zero. The problem is not so much that dark electrons roll into the bucket — it's that the process is stochastic. Because there are a huge number of electrons in the photodiode and the probability that an individual electron will roll into the bucket without being excited by a photon is very small, the Poisson distribution is a decent model for dark current noise. Camera manufacturers frequently specify the dark current, $ I_{dark} $, in units of electrons per pixel per second. The square root of the average number of dark electrons (dark current times exposure time) gives a good approximation for the standard deviation of the dark current noise. with a variance is equal to the average number of dark electrons per frame.
Read noise
Read noise summarizes all of the errors that take place during the process of counting electrons from the bucket. One significant source of read noise is the random thermal motion of electrons in the resistors of the camera's amplifier circuit. Conceptually, at any given time, many electrons in a resistor are in motion. Just by chance, a few more electrons will be jiggling left than right, which gives rise to a noise voltage on the resistor. This is called Johnson noise. Manufacturers usually lump all of the counting errors into a single read noise parameter, frequently the standard deviation of an approximately normal distribution.
Create a synthetic noise movie
The proposed mathematical model of the imaging process, including noise and the camera's gain constant, is:
- $ P_{x,y} = G \bar{N}_{x,y} = G \left( {\bar{N}_{photo}}_{x,y} T + {\epsilon_{shot}}_{x,y} + I_{dark} T + {\epsilon_{dark}}_{x,y} + {\epsilon_{read}}_{x,y} \right) $,
where $ T $ is the exposure time, $ {\epsilon_{shot}}_{x,y} $ is the shot noise, $ {\epsilon_{dark}}_{x,y} $ is the dark noise, and , $ {\epsilon_{read}}_{x,y} $ is the read noise.
Taking the variance gives an expression for the variability in each pixel as a function of its intensity, the exposure time, and the properties of the camera:
- $ \operatorname{Var}(P_{x,y}) = G^2 \operatorname{Var}(\bar{N}_{x,y}) = G^2 \left( {\bar{N}_{photo}}_{x,y} T + I_{dark} T + \sigma^2_{read} \right) $
The code below uses MATLAB's random number generator to create a 100 synthetic pictures of an unchanging light pattern plus realistic shot, dark current, and read noise — similar to the movie you made in Assignment 1. The functions use MATLAB's built-in function poissrnd.
darkCurrent = 1000; exposureTime = 1; readNoiseStandardDeviation = 10; cameraGain = 1; idealImage = 4095 / cameraGain * rand( 200,200 ); noiseMovie = repmat( idealImage, [ 1 1 100 ] ); for ii = 1:size( noiseMovie, 3 ) noiseMovie(:,:,ii) = poissrnd( noiseMovie(:,:,ii) );% shot noise noiseMovie(:,:,ii) = noiseMovie(:,:,ii) + poissrnd( darkCurrent * exposureTime * ones( size( idealImage )) ); % dark current noiseMovie(:,:,ii) = noiseMovie(:,:,ii) + readNoiseStandardDeviation * randn( size( idealImage ) ); % read noise noiseMovie(:,:,ii) = noiseMovie(:,:,ii) - darkCurrent * exposureTime; % subtract off the average value of dark current end noiseMovie = noiseMovie * cameraGain;
You can make a variance vs. mean plot using the same code you used in Assignment 1:
pixelMean = mean( double( squeeze( noiseMovie) ), 3 ); pixelVariance = var( double( squeeze( noiseMovie) ), 0, 3 ); [counts, binValues, binIndexes ] = histcounts( pixelMean(:), 250 ); binnedVariances = accumarray( binIndexes(:), pixelVariance(:), [], @mean ); binnedMeans = accumarray( binIndexes(:), pixelMean(:), [], @mean ); figure loglog( pixelMean(:), pixelVariance(:), 'x' ); hold on loglog( binnedMeans, binnedVariances, 'LineWidth', 3 ) xlabel( 'Mean (ADU)' ) ylabel( 'Variance (ADU^2)') title( 'Image Noise Versus Intensity' )
|
Measure camera parameters
In this part of the assignment, you will use linear regression to find best-fit values for the cameras we use in the lab to the parameters of the mathematical imaging model: dark current, read noise, and gain. A dataset taken on 15 September 2018 is available at this link. The file contains two variables: DarkCurrentData and StaticSceneData. Load the file into MATLAB by typing load( 'Manta Camera Dataset.mat' ). Each of the variables is a structure array, containing data from a series of measurements made using a similar procedure to the one you used in assignment 1. DarkCurrentData contains measurements taken with a variety of exposure times with the camera in complete darkness. StaticSceneData includes measurements made with the camera exposed to an unchanging light pattern that has a variety of intensities. Two fields of the structure record the exposure time (in microseconds) and camera gain setting used for the measurement. The other two fields contain matrices of the mean value and variance for each pixel, computed from a 100 frame movie.
Finding gain
You can use linear regression fit a two-parameter version of the noise model. One parameter is the gain and the other is the sum of read plus dark current noise. (You can use the dark measurements later to get a better estimate of dark current noise.) The code below demonstrates one way to run the regression using the fitlm function in MATLAB:
% get data from one measurement pixelMeans = StaticSceneData(3).PixelMeans(:); pixelVariances = StaticSceneData(3).PixelVariance(:); % plot raw data mainFigureHandle = figure; loglog( pixelMeans, pixelVariances, 'x' ) xlabel( 'Mean (ADU)' ) ylabel( 'Variance (ADU^2)' ) title( 'Manta G-040B Variance vs Mean' ) hold on % compute and plot best fit linear model fitModel = fitlm( pixelMeans, pixelVariances, 'poly1' ); xAxis = logspace( -1, log10( 4095 ), 256 )'; loglog( xAxis, fitModel.predict( xAxis ), 'LineWidth', 3 );
Assessing the regression results
Linear regression relies on several assumptions:
- the variables have a linear relationship
- the mean of the error terms is zero
- errors are not correlated to each other or any of the variables
- the errors are homoscedastic (meaning that they all have the same variance)
- the independent variable is known exactly, with zero noise (or at least much less noise than the dependent variable)
- the residuals are normally distributed
These assumptions are almost never perfectly met in practice. After every regression, it is a very good idea to muck around with the residuals a bit to determine how badly the regression assumptions have been violated. It is important to consider the validity of underlying assumptions when assessing results. A simple first step is to plot the residuals:
% plot residuals figure plot( pixelMeans, fitModel.Residuals.Raw(:), 'x' ); title( 'Raw Residuals' ) xlabel( 'Mean (ADU)' ) ylabel( 'Residual (ADU)' )
Uh oh. The homoscedasticity assumption is not looking too good. The residuals are clearly a function of the average pixel value. Given our mathematical model of imaging noise, this is not too surprising. Fortunately, there is an easy way to remedy problem by weighting the residuals. To restore fairness in the computation, each squared error term gets divided by the its variance. Obviously, in order to use this technique, it is necessary to have an estimate of the variance. Fortunately, the noise model provides one. The code below demonstrates how to redo the regression with weighted residuals.
% compute best fit linear model with weights figure( mainFigureHandle ) fitModel = fitlm( pixelMeans, PixelVariances, 'poly1', 'Weights', 1 ./ PixelVariances ); xAxis = logspace( -1, log10( 4095 ), 256 )'; loglog( xAxis, fitModel.predict( xAxis ), 'LineWidth', 3 ); legend( 'Data', 'Unweighted Best Fit', 'Weighted Best Fit' ) figure weightedResiduals = fitModel.Residuals.Raw(:) ./ PixelVariances; plot( pixelMeans, weightedResiduals, 'x' ); title( 'Weighted Residuals' ) xlabel( 'Mean (ADU)' ) ylabel( 'Residual (ADU)' )
A simple "eyeball" test for normality is to plot a histogram of the residuals, as shown in the code below.
figure histogramObject = histogram( weightedResiduals, 31 ); hold on xAxis = linspace( -1, 1, 256 ); referenceNormal = pdf( 'normal', xAxis, 0, std( weightedResiduals ) ); referenceNormal = referenceNormal / max( referenceNormal ) * max( histogramObject.BinCounts ); plot( xAxis, referenceNormal, 'LineWidth', 3 ) title( 'Distribution of Weighted Residuals' ) xlabel( 'Residual (ADU)' ) ylabel( 'Counts' )
If you look closely, you will notice that the residuals deviate slightly from a normal distribution. There are various approaches to improving the residual distribution, but I wouldn't fault you for calling it "good enough" as it is.
References
- ↑ Answer: extremely surprised. The probability of this happening is about 10-15
- Overview
- Part 1: Noise in images
- Part 2: Fluorescence microscopy
- Part 3: Build an epi-illuminator for your microscope
Back to 20.309 Main Page