Difference between revisions of "20.109(S18):Analyze RNA-seq data and prepare for quantitative PCR experiment (Day 4)"

From Course Wiki
Jump to: navigation, search
(Protocols)
(Part 2: Analyze RNA-seq data)
 
(19 intermediate revisions by 3 users not shown)
Line 4: Line 4:
  
 
==Introduction==
 
==Introduction==
Quantitative polymerase chain reaction (qPCR) allows researchers to monitor the results of PCR as amplification is occurring (this technique is also referred to as real-time polymerase chain reaction or real-time PCR)During qPCR data are collected throughout the amplification process using a fluorescent dyeThe fluorescent dye is highly specific for double-stranded DNA and when bound to DNA molecules the fluorescence intensity increases proportionately to the increase in double-stranded product.  In contrast, the data for traditional PCR are simply observed as a band on a gel.
+
The transcriptome is the full suite of transcripts within an organism and provides the key link between the genetic code and phenotypeResearch focused on the transcriptome has provided important insights into how gene expression is altered in different cell / tissue types, in developmental phases, in disease states, and between speciesIn this module, you will evaluate gene expression differences in the parental DLD-1 cell line compared to the BRCA2-/- mutant cell line.  In addition, you will assess the effects of DNA damage and drug treatments on the transcriptome in these cells.
  
[[Image:Screen Shot 2015-01-27 at 4.04.38 PM.png|thumb|550px|right|To eliminate clutter, the basepairs between the DNA strands were omittedAn animation of this process is linked [http://www.sigmaaldrich.com/life-science/molecular-biology/pcr/quantitative-pcr/sybr-green-based-qpcr/syber-green-animation.html here].]] As depicted in the image to the right, the fluorescent dye binds to double-stranded DNA during the cycles of PCRAt the annealing temperature the primer (blue arrow) binds to the template (black line). During an incubation at the extension temperature the new copy of DNA (orange dashed arrow) is sythesized by the polymerase enzyme. The inactive fluorescent dye molecules present in the reaction (grey stars) bind to the newly generated double-stranded DNA and become activated (green stars).   
+
The gene expression data was generated using RNA-seqIn this method deep sequencing is completed using reagents and equipment from IlluminaWith this technology, transcripts are directly sequenced and mapped to a reference genomeThen the reads are counted to provide information on gene expression levels for a particular portion of the genome (i.e. for a particular gene).   
  
[[Image:S14 M3D5 WF 18S-noiseband.jpg|thumb|400px|right|These qPCR amplification curve data were generated by Sp14 20.109ers for another experimental module!]]  
+
[[Image:Sp18 20.109 M2D4 bridge, cluster.png|thumb|450px|right| Image from Goodwin et al. (2013) ''Nature Rev.'' 17:333-351.]]In RNA-seq, RNA is purified from cells and reverse transcribed into DNA. The DNA molecules are modified with adapters, which are ligated to both ends of the DNA.  Sequences complementary to the adapters are attached to the surface of flow cell channels and facilitate binding of modified DNA molecules and provide a primer for DNA polymerase.  Following the initial binding to the flow cell channel, the DNA molecules from bridges that enable bridge amplification and cluster generation (see figure to the right).  Through this process millions of dense clusters containing double-stranded DNA are generated.
[[Image:S14 M3D5 WF 18S-melt.jpg|thumb|400px|right|These qPCR melt curve data were collected by Sp14 20.109ers for another experimental module!]]
+
  
To assess gene transcript levels, you will examine the C<sub>T</sub> values from your qPCR assay. The C<sub>T</sub> values are displayed as an amplification curve following qPCR (these values are also given numerically).  The initial cycles measure very little fluorescence due to low amounts of double-stranded DNA and are used to establish the inherent background fluorescenceAs double-stranded product is produced, fluorescence is measured and the curve appears linearThis linear portion of the curve represents the exponential phase of PCRThroughout the exponential phase, the curve should be smoothSharp points may be due to errors in reaction preparation or failures in the machine used to measure fluorescenceAs mentioned previously, the first cycle in which the fluorescence measurement is above background is the C<sub>T</sub>.  During the later cycles the curve shows minimal increases in fluorescence due the depletion of reagents.  
+
To directly sequence from the clusters a sequencing by synthesis approach is used. In this, several rounds of amplification are performed using deoxynucleoside triphosphate (dNTP) basesdNTP are terminator molecules given that the ribose 3'-OH group is blocked thereby preventing elongation by polymeraseEach terminating base (dATP, dTTP, dCTP, and dGTP) is fluorescently labeled  (dATP = red, dTTP = green, dCTP = blue, and dGTP = yellow).  For each round of sequencing a mixture containing all four labelled dNTP bases is added and a single base is incorporated to each DNA molecule bound to the flow cell channelsThe flow cell is then imaged to capture the dNTP base that was added at each cluster locationThen the fluorescent label and 3'-OH blocking group is removed from the incorporated dNTP and another round of sequencing is performedThis results in the full sequences of every DNA molecule bound to the flow cell channelsTherefore, the sequence of the cluster denoted by the asterisk is GCTGA in the schematic provided below.
  
Following the qPCR amplification measurements, a melt curve is completed. Melt curves assess the dissociation of double-stranded DNA while the sample is heated. As the temperature is increased, double-stranded DNA ‘melts’ as the strands dissociateAs discussed above, the fluorescent dye used in qPCR associates with double-stranded DNA and fluorescence measurements will decrease as the temperature increasesIn qPCR, the melt curve is used to confirm that a single amplification product was generated during the reactionIf additional products were present, the melt curve would presumably show additional peaks.  Why might this be true?  Can you think of a scenario where two different products would produce a single peak in a melt curve?
+
[[File:Sp18 20.109 M2D4 illumina sequencing.png|thumb|600px|center]]
 +
 
 +
As with all technologies, there are positives and negatives to RNA-seqOn the plus side, the ability to directly sequence enables researchers to assess gene expression in organisms for which a full genome sequence is not available or not fully annotatedFurthermore, this method allows for the quantification of individual isoforms that result from alternate splicingOn this minus side, the cost of RNA-seq can limit the depth of sequencing achieved and genes that are not highly expressed may not be captured in a data set.
  
 
==Protocols==
 
==Protocols==
  
===Part 1: Analyze RNA-seq data===
+
===Part 1: Design primers for qPCR===
Today you will analyze the RNA-seq data gathered from untreated DLD-1 and BRCA2- cells and etoposide treated DLD-1 and BRCA2- cells.  Following RNA purification, the samples were submitted to the BioMicro Center for Illumina sequencing.  Illumina sequencing technology,  or sequencing by synthesis (SBS), is used for massively parallel sequencing with a proprietary method that detects single bases as they are incorporated into growing DNA strands.
+
In Mod 1 you designed primers to amplify the gene that encodes FKBP12.  As you may have guessed, quantitative PCR also requires primers for amplification.  Because designing qPCR primers is more complicated (you need to carefully consider exons, product length, and primer dimers), you will use a free online program to identify possible primer pairs.
 +
#Use the NCBI Primer-BLAST tool to design your qPCR primers.
 +
#*Go to the [https://www.ncbi.nlm.nih.gov/gene NCBI Gene Database].
 +
#*Enter your gene of interest (p21) in the search bar at the top of the screen.
 +
#*Review the list of the search results, and select one that is appropriate for our study (''i.e.'' from what species should the gene be that you select?).
 +
#Choose a gene sequence by clicking on the 'Name/Gene ID' link.
 +
#Review the information provided on the page.
 +
#*What role does this gene product have in physiology?
 +
#*Why is this an interesting target for your research?
 +
#*In what tissues is expression of this gene highest? lowest?
 +
#*On which chromosome is it located?
 +
#Scroll through the page to the NCBI Reference Sequences (RefSeq) section.
 +
#Click on the link for the mRNA sequence (''e.g.'' NM_000389.4).
 +
#*Record the mRNA sequence identification number in your laboratory notebook.
 +
#Again, briefly review the information provided on the page.
 +
#*What is the size of the p21 gene?
 +
#*Is this sequence a variant?  If so, how?
 +
#Click on the 'Pick Primers' link under the Analyze this sequence header at the right side of the screen.
 +
#Confirm that the correct mRNA sequence identification number is in the 'Enter accession, gi, or FASTA sequence' box under the PCR Template header at the top of the screen.
 +
#Update the following settings:
 +
#*Under the Primer Parameters header, to the right of “PCR product size” enter a max number of 150.
 +
#*Under the Exon/intron selection header, select "Primer must span an exon-exon junction".
 +
#*Select Advanced parameters and in the primer parameters type in the excluded regions box "120,7 122,3"
 +
#Click the 'Get Primers' button at the bottom of the screen.
 +
#*Be patient!  It may take up to 2 min for the program to identify possible primer pairs.
 +
#Review the primer pairs identified by the program and consider the following guidelines:
 +
#*Primers should have a GC content of 50-60%.
 +
#*Primers ideally end in G or C.
 +
#*Primer melting temperatures should be similar and ~60 &deg;C.
 +
#*Product should be ~100 bp.
 +
#Select the primer pair that you think best meets the guidelines and record the sequences in your laboratory notebook.
 +
 
 +
In addition to your qPCR primers, you will use primers that the teaching faculty previously ordered and tested to target p21.  Evaluate the sequences of these primers and compare them to those you selected.  Did you choose the same primer pair?  If so, consider if altering the length (or shifting the binding site) will improve the primer pair.  If not, what differences exist and which do you propose would be better at probing p21 mRNA sequences?  Why?<br>
 +
 
 +
*p21_F = CCA GCT GAG GTG TGA GCA G<br>
 +
*p21_R = GTT CTG ACA TGG CGC CTC C<br>
 +
 
 +
Email your qPCR primer sequences to the teaching faculty as they need to be ordered as soon as possible to ensure delivery by the next laboratory session.
 +
 
 +
===Part 2: Analyze RNA-seq data===
 +
Today you will analyze the RNA-seq data gathered from untreated DLD-1 and BRCA2-/- cells and etoposide treated DLD-1 and BRCA2-/- cells.  Following RNA purification, the samples were submitted to the BioMicro Center for Illumina sequencing.  Illumina sequencing technology,  or sequencing by synthesis (SBS), is used for massively parallel sequencing with a proprietary method that detects single bases as they are incorporated into growing DNA strands.
  
[[ Analysis of RNA-Seq data (Day 4)| Analysis of RNA-Seq data]] by Amanda Kedaigle, Prof. Ernest Fraenkel & Prof. Leona Samson.<br>
+
Complete the "Analysis of RNA-seq Data Exercise" developed by Amanda Kedaigle & Prof. Ernest Fraenkel linked [[Media:20.109 RNAseq Analysis.pdf| here]]. The Rmd file with the same information can be found[[Media:20.109_RNAseq_Analysis.Rmd| here]].
  
===Part 2: Design primers for qPCR===
+
The data file ("preprocessed_data.RData") you will need for the analysis exercise is located [[Media:preprocessed_data.RData|here]]
  
 
==Navigation links==
 
==Navigation links==
 
Next day: [[20.109(S18):Investigate RNA-seq data using public databases (Day5) | Investigate RNA-seq data using public databases]]
 
Next day: [[20.109(S18):Investigate RNA-seq data using public databases (Day5) | Investigate RNA-seq data using public databases]]
 
Previous day: [[20.109(S18):Purify RNA and practice RNA-seq data analysis methods (Day3)| Purify RNA and practice RNA-seq data analysis methods]]
 
Previous day: [[20.109(S18):Purify RNA and practice RNA-seq data analysis methods (Day3)| Purify RNA and practice RNA-seq data analysis methods]]

Latest revision as of 19:38, 21 March 2018

20.109(S18): Laboratory Fundamentals of Biological Engineering

Sp18 banner image v2.png

Spring 2018 schedule        FYI        Assignments        Homework        Class data        Communication
       1. Assessing ligand binding        2. Measuring gene expression        3. Engineering biomaterials              


Introduction

The transcriptome is the full suite of transcripts within an organism and provides the key link between the genetic code and phenotype. Research focused on the transcriptome has provided important insights into how gene expression is altered in different cell / tissue types, in developmental phases, in disease states, and between species. In this module, you will evaluate gene expression differences in the parental DLD-1 cell line compared to the BRCA2-/- mutant cell line. In addition, you will assess the effects of DNA damage and drug treatments on the transcriptome in these cells.

The gene expression data was generated using RNA-seq. In this method deep sequencing is completed using reagents and equipment from Illumina. With this technology, transcripts are directly sequenced and mapped to a reference genome. Then the reads are counted to provide information on gene expression levels for a particular portion of the genome (i.e. for a particular gene).

Image from Goodwin et al. (2013) Nature Rev. 17:333-351.
In RNA-seq, RNA is purified from cells and reverse transcribed into DNA. The DNA molecules are modified with adapters, which are ligated to both ends of the DNA. Sequences complementary to the adapters are attached to the surface of flow cell channels and facilitate binding of modified DNA molecules and provide a primer for DNA polymerase. Following the initial binding to the flow cell channel, the DNA molecules from bridges that enable bridge amplification and cluster generation (see figure to the right). Through this process millions of dense clusters containing double-stranded DNA are generated.

To directly sequence from the clusters a sequencing by synthesis approach is used. In this, several rounds of amplification are performed using deoxynucleoside triphosphate (dNTP) bases. dNTP are terminator molecules given that the ribose 3'-OH group is blocked thereby preventing elongation by polymerase. Each terminating base (dATP, dTTP, dCTP, and dGTP) is fluorescently labeled (dATP = red, dTTP = green, dCTP = blue, and dGTP = yellow). For each round of sequencing a mixture containing all four labelled dNTP bases is added and a single base is incorporated to each DNA molecule bound to the flow cell channels. The flow cell is then imaged to capture the dNTP base that was added at each cluster location. Then the fluorescent label and 3'-OH blocking group is removed from the incorporated dNTP and another round of sequencing is performed. This results in the full sequences of every DNA molecule bound to the flow cell channels. Therefore, the sequence of the cluster denoted by the asterisk is GCTGA in the schematic provided below.

Sp18 20.109 M2D4 illumina sequencing.png

As with all technologies, there are positives and negatives to RNA-seq. On the plus side, the ability to directly sequence enables researchers to assess gene expression in organisms for which a full genome sequence is not available or not fully annotated. Furthermore, this method allows for the quantification of individual isoforms that result from alternate splicing. On this minus side, the cost of RNA-seq can limit the depth of sequencing achieved and genes that are not highly expressed may not be captured in a data set.

Protocols

Part 1: Design primers for qPCR

In Mod 1 you designed primers to amplify the gene that encodes FKBP12. As you may have guessed, quantitative PCR also requires primers for amplification. Because designing qPCR primers is more complicated (you need to carefully consider exons, product length, and primer dimers), you will use a free online program to identify possible primer pairs.

  1. Use the NCBI Primer-BLAST tool to design your qPCR primers.
    • Go to the NCBI Gene Database.
    • Enter your gene of interest (p21) in the search bar at the top of the screen.
    • Review the list of the search results, and select one that is appropriate for our study (i.e. from what species should the gene be that you select?).
  2. Choose a gene sequence by clicking on the 'Name/Gene ID' link.
  3. Review the information provided on the page.
    • What role does this gene product have in physiology?
    • Why is this an interesting target for your research?
    • In what tissues is expression of this gene highest? lowest?
    • On which chromosome is it located?
  4. Scroll through the page to the NCBI Reference Sequences (RefSeq) section.
  5. Click on the link for the mRNA sequence (e.g. NM_000389.4).
    • Record the mRNA sequence identification number in your laboratory notebook.
  6. Again, briefly review the information provided on the page.
    • What is the size of the p21 gene?
    • Is this sequence a variant? If so, how?
  7. Click on the 'Pick Primers' link under the Analyze this sequence header at the right side of the screen.
  8. Confirm that the correct mRNA sequence identification number is in the 'Enter accession, gi, or FASTA sequence' box under the PCR Template header at the top of the screen.
  9. Update the following settings:
    • Under the Primer Parameters header, to the right of “PCR product size” enter a max number of 150.
    • Under the Exon/intron selection header, select "Primer must span an exon-exon junction".
    • Select Advanced parameters and in the primer parameters type in the excluded regions box "120,7 122,3"
  10. Click the 'Get Primers' button at the bottom of the screen.
    • Be patient! It may take up to 2 min for the program to identify possible primer pairs.
  11. Review the primer pairs identified by the program and consider the following guidelines:
    • Primers should have a GC content of 50-60%.
    • Primers ideally end in G or C.
    • Primer melting temperatures should be similar and ~60 °C.
    • Product should be ~100 bp.
  12. Select the primer pair that you think best meets the guidelines and record the sequences in your laboratory notebook.

In addition to your qPCR primers, you will use primers that the teaching faculty previously ordered and tested to target p21. Evaluate the sequences of these primers and compare them to those you selected. Did you choose the same primer pair? If so, consider if altering the length (or shifting the binding site) will improve the primer pair. If not, what differences exist and which do you propose would be better at probing p21 mRNA sequences? Why?

  • p21_F = CCA GCT GAG GTG TGA GCA G
  • p21_R = GTT CTG ACA TGG CGC CTC C

Email your qPCR primer sequences to the teaching faculty as they need to be ordered as soon as possible to ensure delivery by the next laboratory session.

Part 2: Analyze RNA-seq data

Today you will analyze the RNA-seq data gathered from untreated DLD-1 and BRCA2-/- cells and etoposide treated DLD-1 and BRCA2-/- cells. Following RNA purification, the samples were submitted to the BioMicro Center for Illumina sequencing. Illumina sequencing technology, or sequencing by synthesis (SBS), is used for massively parallel sequencing with a proprietary method that detects single bases as they are incorporated into growing DNA strands.

Complete the "Analysis of RNA-seq Data Exercise" developed by Amanda Kedaigle & Prof. Ernest Fraenkel linked here. The Rmd file with the same information can be found here.

The data file ("preprocessed_data.RData") you will need for the analysis exercise is located here

Navigation links

Next day: Investigate RNA-seq data using public databases

Previous day: Purify RNA and practice RNA-seq data analysis methods