Assignment 2 Part 1: Noise in images

From Course Wiki
Revision as of 01:36, 18 September 2018 by Steven Wasserman (Talk | contribs)

Jump to: navigation, search
20.309: Biological Instrumentation and Measurement

ImageBar 774.jpg


Overview

Recording a digital image is essentially an exercise in measuring the intensity of light at numerous points on a grid. A simple mental model of a digital image sensor consists of an array of photodiodes with a bucket underneath each diode. Photons strike the photodiodes and give rise to free photoelectrons. The photoelectrons fall into the buckets underneath the photodiodes. At the end of an exposure, the number of electrons in each bucket equals to the number of photons that struck the photodiode above it. But the total charge in each bucket is tiny, so image sensors typically include electronic amplifiers to multiply the charge in each bucket up to a level where it can be measured. The details of the amplifier depend on the type of image sensor. Mathematically, the amplifier can be modeled as multiplication by a constant. The amplified charge in each bucket is converted to a binary integer by an analog-to-digital converter. After all the buckets are measured, the camera returns a matrix of binary pixel values to the computer for recording, manipulation, and display:

$ P_{x,y}=G N_p $,

where $ P $ is the pixel value, $ x $ and $ y $ are the position-dependent matrix indexes, $ G $ is the camera's gain (which can be changed be software configuration on many camera models), and $ N_p $ is the number of photoelectrons in the bucket. The (uncalibrated) units of $ P $ are sometimes written as ADU for "Analog-to-Digital Units" or DN for "Digital Number". $ G $ as defined here has units of ADU per electron or DN per electron. Dividing a pixel value divided by the gain gives the number of electrons in the bucket. (Note that many camera data sheets specify the inverse of gain as defined here such that $ G $ has units of electrons per ADU or DN.)

If only that was the end of the story.

Like most physical measurements, quantifying light intensity is subject to various sources of observational error. The amount of information in any signal, whether it is as an image or a neural recording or a stock price, depends on the Signal to Noise Ratio (SNR). Because microscopic images are frequently dim, minimizing noise is a key challenge in many applications. In this part of the assignment, you will use MATLAB or another computing environment of your choice to simulate the major sources of error in images and then you will measure the characteristics of the Manta G040B cameras we use in the lab.

If you are a bit rusty on probability concepts, review this page before you go on.

Noise sources in images

Three of the most significant error sources that affect digital images are: shot noise, dark current noise, and read noise. This section proposes a probabilistic model for each of the three major sources of error in images.

Shot noise

Shot noise Is a consequence of the inherent variability of photon emission. A useful model of shot noise is to assume that an unchanging luminous source is composed of a gigantic number $ N $ of potential emitters (such as excited molecules), in which each emitter has a tiny probability $ p $ of releasing a photon over the course of a measurement interval that eventually reaches a light detector. (Photodiodes are one example of a light detector, but the properties of shot noise do not depend at all on the detector's implementation.) During a finite measurement interval, an average of $ \bar{N} = N p $ photons trigger the detector. Each of emitters is independent of the others. Therefore, just due to chance alone, the number of photons detected during a specific measurement interval will likely be greater or less than $ \bar{N} $.

A statistical model that matches this kind of situation, where there are a certain number of "trials" and each trial has a certain probability of "success", is called the binomial distribution. The binomial distribution has two parameters: $ N $ for the number of "trials", and $ p $ for the likelihood of "success". The model assumes that each trial is identical to and independent of the others. One example of the binomial distribution that you have likely seen before is the case of flipping a fair coin, say, 100 times. On each toss, the probability of success (heads) is $ p=0.5 $. If $ N=100 $ and $ p=0.5 $, the average number of heads you expect is 50. Of course, most of the time you won't get exactly 50 heads. The binomial distribution answers the question: "How surprised should I be if I get 12 heads?"[1] The probability of getting exactly k successes in N trials is:

$ Pr(k;n,p) = \Pr(x = k) = {n\choose k}p^k(1-p)^{n-k} $

for k = 0, 1, 2, ..., n, where

$ \binom n k =\frac{n!}{k!(n-k)!} $

From this, you can compute the standard deviation of a binomial distribution. Happily, somebody already did the math and you can just look it up on Wikipedia:

$ \sigma_x^2=np(1-p) $

Poisson distribution

Sometimes, it is useful to model a binary process where you know the average value of the process, but not N and p. If p is small ($ p<<1 $), it is straightforward to make the approximation $ (1-p)=1 $. When p is small, a very good approximation of the variance is:

$ \sigma_x^2=np $

The variance of a Poisson-distributed variable is equal to its mean. The Poisson distribution is a useful model for photon emission, radioactive decay, the number of letters you receive in the mail, the number of page requests on a certain website, the number of people who get in line to check out at a store, and many other everyday situations. The ability to estimate the variability of a binary process without knowing N and p is incredibly useful.

Normal distribution. Each band has a width of 1 standard deviation

Normal distribution

The normal distribution (also frequently called a Gaussian distribution or a bell curve is specified by two parameters, an average value $ \mu $, and a standard deviation $ \sigma $:

$ f(x \; | \; \mu, \sigma^2) = \frac{1}{\sqrt{2\pi\sigma^2} } \; e^{ -\frac{(x-\mu)^2}{2\sigma^2} } $

In the normal distribution, about 68% of outcomes fall within the interval $ \mu \pm \sigma $, about 95% in the interval $ \mu \pm 2 \sigma $, and about 99.7% in the interval $ \mu \pm 3 \sigma $



A good probabilistic model for for for shot noise is a Poisson distributed random variable whose variance is equal to the average rate of photon detection. Shot noise occurs in all images, regardless of the

  • Dark current happens because the physical detector exists at a nonzero temperature. Electrons in the detector don't have equal energies. Most of the electrons are at or near ground state at normal temperatures. Just by chance, a few electrons have enough energy to roll out of the photodiode and into the bucket even though they were not excited by a photon. The camera counts these "dark electrons just the same as the photoelectrons that were excited by photons. Even in total darkness the electron count is not zero. The problem is not so much that dark electrons roll into the bucket — it's that the process is stochastic. Because there are a huge number of electrons in the photodiode and the probability that an individual electron will roll into the bucket without being excited by a photon is very small, the Poisson distribution is a decent model for dark current noise. Camera manufacturers frequently specify the dark current in units of electrons per pixel per second. The square root of the average number of dark electrons (dark current times exposure time) gives a good approximation for the standard deviation of the dark current noise.

with a variance is equal to the average number of dark electrons per frame.

Johnson noise displayed on an analog television set. In the absence of an input signal, analog television sets turn up the gain on their amplifier to the maximum. At maximum gain, Johnson noise is amplified to the point where it can be observed directly on the screen.
  • Read noise summarizes all of the errors that take place during the process of counting electrons from the bucket. An amplifier in the camera multiplies the tiny electronic charge in the bucket up to a level where it can be measured. One significant source of read noise is the random thermal motion of electrons in the resistors of the amplifier circuit. Conceptually, at any given time, many electrons are in motion. Just by chance, a few more electrons will be jiggling left than right, which gives rise to a noise voltage on the resistor. This is called Johnson noise. Read noise may include other noise sources in the counting process. It is frequently modeled by a normal distribution.

Useful distributions

Binomial distribution

The binomial distribution considers the case where the outcome of a random variable is the sum of many individual go/no-go trials. The distribution assumes that the trials are independent and that the probability of success, $ p $, for each trial is the same. An example of a process that is well-modeled by the binomial distribution is the number of heads in 100 coin tosses.

Central Limit Theorem

If you add together a bunch of independent random variables, their sum tends to look like a normal distribution.

Modeling a single pixel

Let's start by modeling a single pixel. The first thing we need is a source of random numbers, so fire up MATLAB or any other computing you like on your computer (and stop whining about it). MATLAB includes several functions for generating random numbers. We will make our first model using the function rand(), which returns a random value that follows a uniform distribution in the interval (0, 1). Go ahead and type rand() at the command line. Matlab will return a number between 0 and 1. Do it a few times — it's kind of fun. You get a different number every single time. Can you guess what number will come next? If you can, please come see me.

If you want a bunch of random numbers, rand can help you out with that that, too. If you call rand with two integer arguments M and N, the function will return a matrix of random numbers with M rows by N columns. The code snippet below to generates and displays (as an image) a 492 row x 656 column matrix of numbers with uniform distribution on the interval [0,1]:

noiseImage = rand( 492, 656 );
figure
imshow( noiseImage );

Okay, enough goofing around with the random number generator. Assume you are imaging a single molecule has a 0.01% chance of emitting a photon each microsecond. For the time being, assume you have perfect detector with a 100% chance of detecting each photon. The exposure (duration over which the measurement is made) is 1 second. To model photon emission over a single microsecond, you can use the rand function like this: sum( rand( 1, 1E6 ) < 0.0001 ). This statement will generate a vector of 1 million random numbers. The less than operator returns 1 for all values smaller than 10-4 and 0 for all the rest. Since rand returns numbers between 0 and 1 with a uniform distribution, there will be 0.01% chance of getting a 1 at any given position in the vector. The sum function adds up all of the ones and zeros and returns a count the total number of simulated photons emitted. On average, you would expect to detect about 100 photons during a one second long measurement. From what we know about the Poisson distribution, we can guess that the standard deviation will be very close to the square root of the average value, or about 10 in this case. The std function computes the standard deviation of its argument.

Let's run the simulation one thousand times and plot the results. You can copy-paste this code into the MATLAB command window.

close all
figure

probabilityOfEmission = 1E-4;
numberOfPhotonsEmitted = nan( 1, 1000 ); % allocate space for the results

for ii = 1:1000
    numberOfPhotonsEmitted(ii) = sum( ( rand( 1, 1E6 ) < probabilityOfEmission ) );
    plot( numberOfPhotonsEmitted, 'x' );
    axis( [ 1 1000 0 140 ] );
    xlabel( 'n' );
    ylabel( 'Number of Photons' );
    title( [ 'Photon Count (N=' num2str(ii) ...
        ' siumulations) \mu=' num2str( mean( numberOfPhotonsEmitted(1:ii) ) ) ...
        ' \sigma=' num2str( std( numberOfPhotonsEmitted(1:ii) ) )] );
    drawnow
end

As you can see from the plot, just due to chance alone, the number of photons emitted varies around the average value. Even though the average number of photons emitted and detected doesn't change, the actual number is most often larger or smaller than the average. Due to the stochastic nature of photon emission, it is impossible to measure the average value with complete confidence in a finite amount of time. You have no way of knowing whether there were fewer or greater photons than average during the interval of the measurement. This kind of noise is called shot noise and it is inherent to the imaging process.

Shot noise places a fundamental limit on the precision of light measurements. Even if you do everything exactly right, the signal to noise ratio of is

$ \textrm{SNR}=\frac{N}{\sigma_N}=\frac{N}{\sqrt{N}}=\sqrt{N} $

An ideal detector is said to be shot noise limited and achieves an SNR of $ \sqrt{N} $

Gaussian approximation

MATLAB can produce a lovely histogram of the simulation results:

edges = 40:10:160;
centers = ( edges(1:(end-1)) + edges(2:end) ) / 2;
[ counts ] = histcounts( numberOfPhotonsEmitted, edges );
figure
bar( centers, counts )

% add numbers on top of the bars
for ii=1:numel(centers)
    text( centers(ii), counts(ii), num2str( counts(ii) ),...
               'HorizontalAlignment','center',...
               'VerticalAlignment','bottom')
end

title( 'Distribution of Simulated Photon Counts' )
ylabel( 'Number of Results in Range' )
xlabel( 'Number of Photons' )


Pencil.png
  • What percentage of the results fall with one, two, and three standard deviations? Is it reasonable to approximate this situation as a normal distribution?
  • Produce a log-log plot of standard deviation of the number of photons emitted as a function of the average number of photons emitted for probability of photon emission equal to the values: 10-6, 10-5, 10-4, 10-3, and 10-2.

Hints:

    • Use a nested loop.
    • Speed things up by getting rid of the plotting inside the loop and only run 100 simulations for each probability.
  • What is the relationship between the number of photons detected and the noise (standard deviation)?
  • Repeat the simulation for a detector that has only a 25% chance of detecting a photon and a light source that is 4 time brighter.
  • Turn in the code you used to generate the plots



Read noise

Conceptually, detection of light can be decomposed into three steps: 1. conversion of light to free electrical charge-carriers; 2. conversion of current to voltage; 3measurement of voltage.

Imaging noise model

The formula below is a mathematical model of the imaging process, including noise and the camera's gain constant:

$ P_{x,y} =G \left( I_{x,y} + \epsilon_{x,y} \right) $,

where $ x $ and $ y $ are spatial coordinates, $ P_{x,y} $ is the matrix of pixel values returned by the camera , $ G $ is the gain of the camera, $ I_{x,y} $ is the actual light intensity in the plane of the detector, and $ \epsilon_{x,y} $ is a matrix of error terms that represent noise introduced in the imaging process.



Measure camera parameters

In this part of the assignment, you will use linear regression to quantify dark current, read noise, and gain of the Manta G-040B cameras in the lab. A dataset taken on 15 September 2018 is available at this link. You can

Create a synthetic noise movie

The code below uses MATLAB's random number generator to create a 100 synthetic pictures of an unchanging light pattern plus realistic shot, dark current, and read noise — similar to the movie you made in Assignment 1. The functions use MATLAB's built-in function poissrnd.

darkCurrent = 1000;
exposureTime = 1;
readNoiseStandardDeviation = 10;
cameraGain = 0.25;

idealImage = 4095 / cameraGain * rand( 200,200 );
noiseMovie = repmat( idealImage, [ 1 1 100 ] );

for ii = 1:size( noiseMovie, 3 )
    noiseMovie(:,:,ii) = poissrnd( noiseMovie(:,:,ii) );% shot noise
    noiseMovie(:,:,ii) = noiseMovie(:,:,ii) + poissrnd( darkCurrent * exposureTime * ones( size( idealImage )) ); % dark current
    noiseMovie(:,:,ii) = noiseMovie(:,:,ii) + readNoiseStandardDeviation * randn( size( idealImage ) ); % read noise
    noiseMovie(:,:,ii) = noiseMovie(:,:,ii) - darkCurrent * exposureTime; % subtract off the average value of dark current
end

noiseMovie = noiseMovie * cameraGain;

You can make a variance vs. mean plot using the same code you used in Assignment 1:

pixelMean = mean( double( squeeze( noiseMovie) ), 3 );
pixelVariance = var( double( squeeze( noiseMovie) ), 0, 3 );
[counts, binValues, binIndexes ] = histcounts( pixelMean(:), 250 );
binnedVariances = accumarray( binIndexes(:), pixelVariance(:), [], @mean );
binnedMeans = accumarray( binIndexes(:), pixelMean(:), [], @mean );
figure
loglog( pixelMean(:), pixelVariance(:), 'x' );
hold on
loglog( binnedMeans, binnedVariances, 'LineWidth', 3 )
xlabel( 'Mean (ADU)' )
ylabel( 'Variance (ADU^2)')
title( 'Image Noise Versus Intensity' )


Pencil.png
  • On one set of axes, plot the variance vs. mean for exposure times of 10-4, 10-3, 10-2, 10-1, 100 seconds.
  • In what situations is dark current noise a significant problem?


References

  1. Answer: extremely surprised. The probability of this happening is about 10-15


Back to 20.309 Main Page