<noautolink> ---+ Analyzing RNA-Seq data for differential gene expression ---++ Materials and Software * Data files * RNA-Seq data in FASTQ format (Ex: dataset1.fastq, dataset2.fastq) * Genome sequence files * Genome sequence in FASTA format (Ex: REL606.fna) * Genome sequence gene annotations in GFF3 format (Ex: REL606.gff3) * Adaptor filtering software *FAR [[http://sourceforge.net/apps/mediawiki/theflexibleadap/][Manual]] | [[http://sourceforge.net/projects/theflexibleadap/][Download]] * Read mapper software * Bowtie2 (first choice) * [[http://bowtie-bio.sourceforge.net][Download Bowtie]] * To build, just type "make" in the source code directory. * Add this directory to your $PATH or move bowtie, and bowtie-* star executable to your $PATH * BWA (alternate choice) * [[http://bio-bwa.sourceforge.net/][Download BWA]] * To build, just type "make" in the source code directory. * To install, move the executable "bwa" to somewhere in your $PATH, like $HOME/local/bin. * For usage see the [[http://bio-bwa.sourceforge.net/bwa.shtml][BWA manual]]. * [[http://barricklab.org/breseq][breseq]] * R statistics package * [[http://cran.r-project.org/][Download and install R]] * Bioconductor R modules * library(edgeR) * library(DESEQ) ---++ Commands ---+++ Remove adaptor sequences from reads You will need a FASTA file of adaptor sequences. * [[https://wikis.utexas.edu/display/GSAF/Illumina+-+all+flavors][Illumina Adapters Sequences (GSAF)]] | [[%ATTACHURL%/gsaf_illumina_adapters.fasta][Download FASTA]] For each input file you will need to run this command (single-end data): %BR% <code>$far --source datasetX.fastq --target datasetX.noadaptor --adaptive-overlap --trim-end any --adapters gsaf_illumina_adapters.fasta --format fastq-sanger</code> There is an option to process paired-end data like this: %BR% <code>$far --source datasetX_R1.fastq --source2 datasetX_R2.fastq --target datasetX.noadaptor --adaptive-overlap --trim-end any --adapters gsaf_illumina_adapters.fasta --format fastq-sanger</code> ---++++ Compile and install FAR on MacOSX Unfortunately, FAR comes only with Windows and Linux binaries. To build FAR(2.0) for MacOSX: 1 Install Apple Developer Tools 1 Install cmake: $sudo port install cmake 1 Check out code: %BR% <code>$ svn co https://theflexibleadap.svn.sourceforge.net/svnroot/theflexibleadap theflexibleada</code> 1 Compile code: %BR% <code>$ cd theflexibleada</code> %BR% <code>$ cmake CMakeLists.txt </code> 1 Copy executable and library: %BR% <code>$ cp lib/libtbb.dylib ~/local/lib</code> %BR% <code>$ cp build/* ~/local/bin </code> 1 Add these locations to your path with lines in ~/.profile: %BR% <code>export DYLD_LIBRARY_PATH="$DYLD_LIBRARY_PATH:$HOME/local/lib" %BR% export PATH="$PATH:$HOME/local/bin" </code> ---+++ Align reads to reference genome ---++++ Using bowtie2 First, index your genome so bowtie2 can map read to it: %BR% <code>$bowtie2-build REL606.fna REL606</code> %BR% Then, align each data set: %BR% <code>$bowtie2 -x REL606 -U datasetX.fastq --phred33 -S REL606.sam</code> %BR% Optionally, add the <code>--local</code> flag if your reads do not map end-to-end. ---++++ Using BWA First, index your genome so BWA can map read to it: %BR% <code>$bwa index REL606.fna</code> %BR% Then, align each data set: %BR% <code>$bwa aln REL606.fna dataset1.fastq > datasetX.sai </code> %BR% And convert to SAM format (assumes single-end data): <code>$bwa samse REL606.fna datasetX.sai datasetX.fastq > datasetX.sam </code> %BR% ---++ Count reads mapping to genes <code>breseq RNASEQ -f REL606.fna -r REL606.gbk -o datasetX.count.tab datasetX.sam</code> %BR% ---++ Convert alignments to BAM And convert to BAM format (assumes single-end data): %BR% <code>$samtools faidx REL606.fna </code> %BR% <code>$samtools import REL606.fna datasetX.sam datasetX.unsorted.bam </code> %BR% <code>$samtools sort datasetX.unsorted.bam datasetX </code> %BR% <code>$samtools index datasetX.bam </code> %BR% Now you can use IGV to view them. ---++ Analyze differential gene expression ---+++ Using DESeq * [[http://bioconductor.org/packages/devel/bioc/html/DESeq.html][Manual and Instructions]]
Attachments
Attachments
Topic attachments
I
Attachment
History
Action
Size
Date
Who
Comment
fasta
gsaf_illumina_adapters.fasta
r1
manage
0.2 K
2012-01-30 - 16:31
JeffreyBarrick
Edit
|
Attach
|
Watch
|
P
rint version
|
H
istory
:
r8
<
r7
<
r6
<
r5
<
r4
|
B
acklinks
|
V
iew topic
|
More topic actions...
Barrick Lab
>
ProtocolList
>
ProtocolsRNASeqDifferentialExpression
Contributors to this topic
JeffreyBarrick
Topic revision: r6 - 2012-02-11 - 17:41:54 - Main.JeffreyBarrick
Barrick Lab
Contact
Research
Publications
Team
Protocols
Reference
Computation
Software
UT Austin
Mol Biosciences
ILS
Microbiology
EEB
CSSB
CBRS
The LTEE
iGEM team
SynBioCyc
SynBio course
NGS course
BEACON
Search
Log in
Copyright ©2024 Barrick Lab contributing authors. Ideas, requests, problems?
Send feedback