20.109(S18):Analyze RNA-seq data and prepare for quantitative PCR experiment (Day 4)

From Course Wiki
Jump to: navigation, search
20.109(S18): Laboratory Fundamentals of Biological Engineering

Sp18 banner image v2.png

Spring 2018 schedule        FYI        Assignments        Homework        Class data        Communication
       1. Assessing ligand binding        2. Measuring gene expression        3. Engineering biomaterials              


Introduction

The transcriptome is the full suite of transcripts within an organism and provides the key link between the genetic code and phenotype. Research focused on the transcriptome has provided important insights into how gene expression is altered in different cell / tissue types, in developmental phases, in disease states, and between species. In this module, you will evaluate gene expression differences in the parental DLD-1 cell line compared to the BRCA2-/- mutant cell line. In addition, you will assess the effects of DNA damage and drug treatments on the transcriptome in these cells.

The gene expression data was generated using RNA-seq. In this method deep sequencing is completed using reagents and equipment from Illumina. With this technology, transcripts are directly sequenced and mapped to a reference genome. Then the reads are counted to provide information on gene expression levels for a particular portion of the genome (i.e. for a particular gene).

Image from Goodwin et al. (2013) Nature Rev. 17:333-351.
In RNA-seq, RNA is purified from cells and reverse transcribed into DNA. The DNA molecules are modified with adapters, which are ligated to both ends of the DNA. Sequences complementary to the adapters are attached to the surface of flow cell channels and facilitate binding of modified DNA molecules and provide a primer for DNA polymerase. Following the initial binding to the flow cell channel, the DNA molecules from bridges that enable bridge amplification and cluster generation (see figure to the right). Through this process millions of dense clusters containing double-stranded DNA are generated.

To directly sequence from the clusters a sequencing by synthesis approach is used. In this, several rounds of amplification are performed using deoxynucleoside triphosphate (dNTP) bases. dNTP are terminator molecules given that the ribose 3'-OH group is blocked thereby preventing elongation by polymerase. Each terminating base (dATP, dTTP, dCTP, and dGTP) is fluorescently labeled (dATP = red, dTTP = green, dCTP = blue, and dGTP = yellow). For each round of sequencing a mixture containing all four labelled dNTP bases is added and a single base is incorporated to each DNA molecule bound to the flow cell channels. The flow cell is then imaged to capture the dNTP base that was added at each cluster location. Then the fluorescent label and 3'-OH blocking group is removed from the incorporated dNTP and another round of sequencing is performed. This results in the full sequences of every DNA molecule bound to the flow cell channels. Therefore, the sequence of the cluster denoted by the asterisk is GCTGA in the schematic provided below.

Sp18 20.109 M2D4 illumina sequencing.png

As with all technologies, there are positives and negatives to RNA-seq. On the plus side, the ability to directly sequence enables researchers to assess gene expression in organisms for which a full genome sequence is not available or not fully annotated. Furthermore, this method allows for the quantification of individual isoforms that result from alternate splicing. On this minus side, the cost of RNA-seq can limit the depth of sequencing achieved and genes that are not highly expressed may not be captured in a data set.

Protocols

Part 1: Design primers for qPCR

In Mod 1 you designed primers to amplify the gene that encodes FKBP12. As you may have guessed, quantitative PCR also requires primers for amplification. Because designing qPCR primers is more complicated (you need to carefully consider exons, product length, and primer dimers), you will use a free online program to identify possible primer pairs.

  1. Use the NCBI Primer-BLAST tool to design your qPCR primers.
    • Go to the NCBI Gene Database.
    • Enter your gene of interest (p21) in the search bar at the top of the screen.
    • Review the list of the search results, and select one that is appropriate for our study (i.e. from what species should the gene be that you select?).
  2. Choose a gene sequence by clicking on the 'Name/Gene ID' link.
  3. Review the information provided on the page.
    • What role does this gene product have in physiology?
    • Why is this an interesting target for your research?
    • In what tissues is expression of this gene highest? lowest?
    • On which chromosome is it located?
  4. Scroll through the page to the NCBI Reference Sequences (RefSeq) section.
  5. Click on the link for the mRNA sequence (e.g. NM_000389.4).
    • Record the mRNA sequence identification number in your laboratory notebook.
  6. Again, briefly review the information provided on the page.
    • What is the size of the p21 gene?
    • Is this sequence a variant? If so, how?
  7. Click on the 'Pick Primers' link under the Analyze this sequence header at the right side of the screen.
  8. Confirm that the correct mRNA sequence identification number is in the 'Enter accession, gi, or FASTA sequence' box under the PCR Template header at the top of the screen.
  9. Update the following settings:
    • Under the Primer Parameters header, to the right of “PCR product size” enter a max number of 150.
    • Under the Exon/intron selection header, select "Primer must span an exon-exon junction".
    • Select Advanced parameters and in the primer parameters type in the excluded regions box "120,7 122,3"
  10. Click the 'Get Primers' button at the bottom of the screen.
    • Be patient! It may take up to 2 min for the program to identify possible primer pairs.
  11. Review the primer pairs identified by the program and consider the following guidelines:
    • Primers should have a GC content of 50-60%.
    • Primers ideally end in G or C.
    • Primer melting temperatures should be similar and ~60 °C.
    • Product should be ~100 bp.
  12. Select the primer pair that you think best meets the guidelines and record the sequences in your laboratory notebook.

In addition to your qPCR primers, you will use primers that the teaching faculty previously ordered and tested to target p21. Evaluate the sequences of these primers and compare them to those you selected. Did you choose the same primer pair? If so, consider if altering the length (or shifting the binding site) will improve the primer pair. If not, what differences exist and which do you propose would be better at probing p21 mRNA sequences? Why?

  • p21_F = CCA GCT GAG GTG TGA GCA G
  • p21_R = GTT CTG ACA TGG CGC CTC C

Email your qPCR primer sequences to the teaching faculty as they need to be ordered as soon as possible to ensure delivery by the next laboratory session.

Part 2: Analyze RNA-seq data

Today you will analyze the RNA-seq data gathered from untreated DLD-1 and BRCA2-/- cells and etoposide treated DLD-1 and BRCA2-/- cells. Following RNA purification, the samples were submitted to the BioMicro Center for Illumina sequencing. Illumina sequencing technology, or sequencing by synthesis (SBS), is used for massively parallel sequencing with a proprietary method that detects single bases as they are incorporated into growing DNA strands.

Complete the "Analysis of RNA-seq Data Exercise" developed by Amanda Kedaigle & Prof. Ernest Fraenkel linked here. The Rmd file with the same information can be found here.

The data file ("preprocessed_data.RData") you will need for the analysis exercise is located here

Navigation links

Next day: Investigate RNA-seq data using public databases

Previous day: Purify RNA and practice RNA-seq data analysis methods