20.109(S18):Analyze small microarray data and induce protein expression (Day2)

From Course Wiki
Revision as of 14:34, 14 February 2018 by Casper Enghuus (Talk | contribs)

Jump to: navigation, search
20.109(S18): Laboratory Fundamentals of Biological Engineering

Sp18 banner image v2.png

Spring 2018 schedule        FYI        Assignments        Homework        Class data        Communication
       1. Assessing ligand binding        2. Measuring gene expression        3. Engineering biomaterials              


Introduction

In this module, you will build upon the research completed by former 109ers. Using the results of their small molecule microarray (SMM) you will complete secondary assays to confirm the binders identified. Today you will use the data previously collected to identify 'hits', or ligands, that are able to bind FKBP12. Though you may be able to qualitatively visualize spots that appear to emit more fluorescence, it is important to complete quantitative analysis that supports your observations.

In Spring 2017, the class used microarrayer to read the fluorescence signals on the surface of the SMM at two excitation wavelengths. As noted previously, the 532 nm wavelength was used to excite fluorescein, which was printed in an 'X' pattern to assist with alignment. The 635 nm wavelength was used to excite the fluorophore-conjugated anti-His antibody, which should only bind FKBP12. A hit denotes a spot on the slide that emits a fluorescence signal significantly higher than the background fluorescence level. In terms of protein binding, a hit denotes that the FKBP12 protein is bound to a compound and the antibody is therefore localized to that position on the slide. You will analyze the fluorescence emission data collected by the microarray scanner using two quantitative approaches: a robust z score and a p-value.

Sp17 20.109 M1D6 spots.png

When the Koehler Lab prepared each printed glass slide, the microarrayer also produced a GAL file, or GenePix Array List, which can be viewed using Excel. The GAL file contains information about where each spot was printed, and what compound was printed there. However, the relationship between the GAL file and the actual contact of the print head is very imprecise. Instead, we will use the fluorescein guide spots to align the array in the GAL file to the true print location for each pin. For this alignment, we’ll use a tool provided by the Chemical Genetics Section at the NCI. This tool searches the scanned image for these guide spots and attempts to rotate, translate, and scale the array to best match the observed fluorescence. Following the alignment, we will compare the fluorescence at 635nm within the deposition region of each spot (foreground) to the fluorescence immediately outside of this region, where nothing was printed (background). We’ll use these values to calculate a robust z score. From the robust z score, we can determine the associated probability that the observed fluorescence occurred by chance (p-value), and if this probability is sufficiently low, we call the compound a hit.

(Written with assistance from Rob Wilson, graduate student in the Koehler Laboratory.)

Protocols

Part 1: Induce FKBP12 protein expression

Yesterday afternoon, the teaching faculty inoculated 5 mL of LB media with BL21(DE3)pLysS pRSETb_FKBP12, incubated the culture at 37 °C for 7 hours and then stored at 4 °C overnight. Two hours prior to the start of this laboratory session, the stored culture was used to inoculate 50 mL of fresh LB media containing ampicillin and chloramphenicol at a 1:10 dilution.

  1. Measure the OD600 of the diluted culture.
  2. When the OD600 = 0.5-0.8, add IPTG to a final concentration of 1 mM to your E. coli bacterial culture, and return the flask to the shaker.

Your culture will incubate for ~16 hr at 25°C then the teaching faculty will collect the cells by centrifugation at 3000 g for 10 min. The harvested cells will be stored in the -80 °C freezer.

Part 2: Agarose gel electrophoresis of confirmation digests

Electrophoresis is a technique that separates large molecules by size using an applied electrical field and a sieving matrix. DNA, RNA and proteins are the molecules most often studied with this technique; agarose and acrylamide gels are the two most common sieves. The molecules to be separated enter the matrix through a well at one end and are pulled through the matrix when a current is applied across it. The larger molecules get entwined in the matrix and are stalled; the smaller molecules wind through the matrix more easily and travel farther away from the well. The distance a DNA fragment travels is inversely proportional to the log of its length. Over time fragments of similar length accumulate into “bands” in the gel. Higher concentrations of agarose can be used to resolve smaller DNA fragments.

Diagram of gel electrophoresis chamber. Larger sized DNA molecules will remain close to the well where the sample was loaded and smaller DNA molecules will migrate through the gel toward the positive electrode.

DNA and RNA are negatively charged molecules due to their phosphate backbone, and they naturally travel toward the positive electrode at the far end of the gel. Today you will separate DNA fragments using an agarose matrix. Agarose is a polymer that comes from seaweed and if you’ve ever made Jell-O™, then you already have all the skills needed for pouring an agarose gel! To prepare these gels, agarose and 1X TAE buffer (Tris base, acetic acid, and EDTA) are microwaved until the agarose is melted and fully dissolved. The molten agar is then poured into a horizontal casting tray, and a comb is added. Once the agar has solidified, the comb is removed, leaving wells into which the DNA samples can be loaded.

You will use a 1% agarose gel (prepared by the teaching faculty) to separate the DNA fragments in your four digested samples as well as a reference lane of molecular weight markers (also called a DNA ladder).

  1. Add 5 μL of 6x loading dye to the digests.
    Illustration of proper gel loading technique.
    • Loading dye contains bromophenol blue as a tracking dye to follow the progress of the electrophoresis (so you don’t run the smallest fragments off the end of your gel!) as well as glycerol to help the samples sink into the wells.
  2. Flick the eppendorf tubes to mix the contents, then quick spin them in the microfuge to bring the contents of the tubes to the bottom.
  3. Load 25 μL of each digest into the gel, as well as 10 μL of 1kb DNA ladder.
    • Be sure to record the order in which you load your samples!
    • To load your samples, draw the volume listed above into the tip of your P200 or P20. Lower the tip below the surface of the buffer and directly over the well. Avoid lowering the tip too far into the well itself so as to not puncture the well. Expel your sample slowly into the well. Do not release the pipet plunger until after you have removed the tip from the gel box (or you'll draw your sample back into the tip!).
  4. Once all the samples have been loaded, attach the gel box to the power supply and electrophorese the gel at 125 V for 45 minutes.

Part 3: Align the array and quantify spot fluorescence

The SMM alignment tool is provided on 20.109 laboratory computers. If you feel comfortable working with a Python development environment, we can also make the source code available to download. This will require the installation of Python 3.5 and multiple third party libraries which are included in Anaconda 4.2.0.

  1. Download the .gal files that correspond to the barcodes your team is assigned on the Class data page.
  2. Download on the desktop of your 20.109 computer this software package, developed by Rob Wilson and courtesy of the NCI Chemical Genetics Section.
  3. Open a Terminal window from Finder\ Applications. Type in python ~/Downloads/SMMAnalysisTool.zip, and press Enter to execute the code.
    • Note: a recurring bug may prevent the menu bar from responding. If this occurs, click out of the window then click back into the window.
  4. Open the .tiff file for one of your slides by selecting File → Load TIFF.
    • You can change the wavelength channel that is displayed using the drop-down menu labeled 'Display'.
    • You can adjust the background fluorescence signal by moving the top slider rightward.
    • You can saturate the foreground fluorescence signal by moving the bottom slider leftward.
  5. Visually inspect the slide image and note all observations concerning flaws (damaged spots), obvious hits, residual fluorescence, etc.
  6. Open the .gal file corresponding to the barcode on the slide by selecting File → Load GAL.
    • Hover your cursor over the interesting spots observed to see which compounds were printed at these locations.
  7. In the box labeled 'Guide Name' at the left of the window, type "Sentinel" in the field for 532.
    • Leave the field for 635 blank as we do not use guide spots in this channel.
    • You should observe spots in the array turn green in an X pattern (the pattern in which the fluorescein spots were printed).
  8. Click the 'Align All' button at the lower left of the window to align the entire array to the nearest observed guide spots.
    • Confirm that the alignment is reasonable throughout the entire slide.
    • Flaws in the slide may disrupt the alignment and negatively affect the quantification.
  9. Click the 'Align Each' button at the lower left of the window to align each subarray to the nearest guide spots.
    • Confirm that the alignment is reasonable for each block.
    • If the alignment is not reasonable, or can be improved upon, you should drag and drop the spot outlines manually.
    • To return to the original array, click the 'Reset' button at the lower left of the window.
  10. Select File → Save Image to save a picture of your slide.
    • This saves the current channel exactly as it is displayed in PNG format.
  11. Calculate and save the fluorescence of the foregrounds and backgrounds of each spot.
    • Select File → Save Spot Intensities.
    • This will output a CSV file.
  12. Repeat Steps #4-11 for each of your slides.
    • Note: the GAL file for each slide is specific to that slide and you will therefore need a different GAL file for each slide.
Sp17 20.109 M1D6 array window.png

Part 4: Calculate robust z scores and call hits

The output data, or CSV, file you created in Part 1 saves the information for each spot on your SMM. Each spot contains one compound and each compound was printed at multiple spot locations. Ultimately, we are interested in the summary information for each compound that was printed on your slides. To analyze the summary information, we will use a Pivot Table. A Pivot Table is a data summarization tool that is able to sort, count, total, or average the data stored in one table or spreadsheet. This tool allows researchers to quickly highlight and manipulate desired information within more complex spreadsheets. You will use a Pivot Table to calculate the z scores and p-values for your data.

The directions below are written according for use with the 20.109 laboratory computers. If you use your own computer, the directions may be slightly different, especially if you use a PC. Please ask if you need assistance!

  1. Open the output data CSV file for one of your slides in Excel.
    • There are several columns, but don't worry; we are only interested in the one labeled 'SNR 635'.
    • This value represents the signal-to-noise ratio calculated in the 635 nm channel and is defined by SNR = ( μforeground - μbackground ) / σbackground , where μ is mean and σ is standard deviation.
      Sp17 20.109 M1D6 PivotTable window v2.png
  2. Select Data → PivotTable to summarize the data and create a PivotTable in a new worksheet.
    • Be sure that a cell within your spreadsheet is highlighted (any cell will work).
    • A new Sheet will be created in your spreadsheet and the 'PivotTable Builder' window should appear.
  3. To populate your PivotTable complete the following:
    • From the 'Field Name' box, select 'Name' and drag it into the 'Row Labels' box.
    • From the 'Field Name' box, select 'SNR 635' and drag it into the 'Values' box.
      • Excel will default to 'Sum' of SNR635. Change to 'Average...' by clicking the i then selecting Average from the 'Summarize by:' options.
    • Again, from the 'Field Name' box, select 'SNR 635' and drag it into the 'Values' box.
      • Change to 'StdDev...' by clicking the i then selecting StdDev from the 'Summarize by:' options.
    • Be sure your 'PivotTable Builder' window looks like the image to the right.
  4. The PivotTable should be populated as you complete the items in Step #3 and appear similar to the image below. When you are done, close the PivotTable Builder window.
    Sp17 20.109 M1D6 PivotTable example.png
  5. Calculate the median absolute deviation, defined as MAD = median ( |xi - median(x)| ).
    • Enter "=MEDIAN(ABS(data-MEDIAN(data)))" where data is the data range in the 'Average of SNR 635' column.
    • Important: After you enter the equation, click within the fx field and use the key stroke 'COMMAND + SHIFT + ENTER' to input the array formula.
  6. Calculate the robust z score for each compound, defined as Z = (xi - median(x)) / (1.486(MAD)).
    • Enter "=(cell-MEDIAN($data))/(1.486*$mad) where cell is the cell containing the 'Average of SNR 635' for the compound, data is the data range in the 'Average of SNR 635' column, and mad is the fixed value in the cell that contains the calculated MAD.
    • Time-saving tip: After you calculate the z score of the first compound, double-click on the small black box at the bottom right of its cell; it will automatically expand the formula to all available rows!
    • If you have trouble calculating the Z-scores, an alternative approach can be used:
    1. Copy the Pivot table as values to a new sheet
    2. Create a column named "Abs difference to median" to calculate the absolute difference between data points and the median. Type in the formula "=ABS(data point - MEDIAN(data column)" where data point is a single entry in the "Average of SNR 635 column" and data column is all the entries in the "Average of SNR 635" column. Use the time-saving tip above to fill all rows for the column.
    3. Calculate the MAD as "=MEDIAN(Abs difference to median column)"
    4. Proceed to calculate the Z-scores as above
  7. Sort your compounds by robust z score.
    • If you chose not to use the Pivot Table, you can sort the Z-scores by first selecting all columns except the column with the MAD value (make sure this value is in the rightmost column as sorted columns need to be adjacent to each other). With the columns selected, go to the data tab and click on sort. Set the column to be the column holding the Z-scores and the order to be Largest to smallest. Click OK. Be aware that if you select the column holding the MAD value.
  8. In the next column, you will indicate which compounds are common binders and thus are not necessarily specific to the protein of interest.
    • Download this Excel spreadsheet into the same folder as the other CSV files output by the SMM data analysis software. This spreadsheet categorizes whether certain compounds should be included or excluded in your results. Compounds are excluded if they are known to bind to two or more different proteins.
    • In the next column of your CSV data analysis file, enter "=VLOOKUP(firstcell, '[Specificity2.xlsx]Specificity'!$A$2:$W$50001, 23, FALSE)" where firstcell is the cell address containing the first compound SMILE.
    • Expand the formula to all the available rows.
      • If the entry states "exclude," the corresponding compound is a common binder and should be excluded from your results.
      • If the entry states "include" or "N/A" (meaning that the compound was not in the spreadsheet of potential binders), then include these compounds in your results.
  9. Email the list of hits identified in your analysis to the teaching faculty.

Reagents

  • LB (Luria-Bertani broth, BD Biosciences)
    • 1% Tryptone
    • 0.5% Yeast Extract
    • 1% NaCl
    • autoclaved for sterility
  • Ampicillin stock: 100 mg/mL, aqueous, sterile-filtered, store at -20 °C
  • Chloramphenicol stock: 34 mg/mL in ethanol, store at -20 °C
  • LB+Amp+Cam
    • LB with 100 μg/mL ampicillin and 34 μg/mL chloramphenicol
  • isopropyl β-D-1-thiogalactopyranoside (IPTG) stock (0.1 M, Sigma Aldrich)

Navigation links

Next day: Purify protein for secondary assays

Previous day: In silico cloning and confirmation digest of protein expression vector