<noautolink> ---+ Analyzing RNA-Seq data for differential gene expression ---++ Materials and Software * Data files * RNA-Seq data in FASTQ format (Ex: dataset1.fastq, dataset2.fastq) * Genome sequence files * Genome sequence in FASTA format (Ex: REL606.fna) * Genome sequence gene annotations in GFF3 format (Ex: REL606.gff3) * Adaptor filtering software *FAR [[http://sourceforge.net/apps/mediawiki/theflexibleadap/][Manual]] | [[http://sourceforge.net/projects/theflexibleadap/][Download]] * Read mapper software * Bowtie2 (first choice) * [[http://bowtie-bio.sourceforge.net][Download Bowtie]] * To build, just type "make" in the source code directory. * Add this directory to your $PATH or move bowtie, and bowtie-* star executable to your $PATH * BWA (alternate choice) * [[http://bio-bwa.sourceforge.net/][Download BWA]] * To build, just type "make" in the source code directory. * To install, move the executable "bwa" to somewhere in your $PATH, like $HOME/local/bin. * For usage see the [[http://bio-bwa.sourceforge.net/bwa.shtml][BWA manual]]. * [[http://barricklab.org/breseq][breseq]] * R statistics package * [[http://cran.r-project.org/][Download and install R]] * Bioconductor R modules * library(edgeR) * library(DESEQ) ---++ Commands ---+++ Remove adaptor sequences from reads You will need a FASTA file of adaptor sequences. * [[https://wikis.utexas.edu/display/GSAF/Illumina+-+all+flavors][Illumina Adapters Sequences (GSAF)]] | [[%ATTACHURL%/gsaf_illumina_adapters.fasta][Download FASTA]] For each input file you will need to run this command (single-end data): %BR% <code>$far --source datasetX.fastq --target datasetX.noadaptor --adaptive-overlap --trim-end any --adapters gsaf_illumina_adapters.fasta --format fastq-sanger</code> There is an option to process paired-end data like this: %BR% <code>$far --source datasetX_R1.fastq --source2 datasetX_R2.fastq --target datasetX.noadaptor --adaptive-overlap --trim-end any --adapters gsaf_illumina_adapters.fasta --format fastq-sanger</code> ---++++ Compile and install FAR on MacOSX Unfortunately, FAR comes only with Windows and Linux binaries. To build FAR(2.0) for MacOSX: 1 Install Apple Developer Tools 1 Install cmake: $sudo port install cmake 1 Check out code: %BR% <code>$ svn co https://theflexibleadap.svn.sourceforge.net/svnroot/theflexibleadap theflexibleada</code> 1 Compile code: %BR% <code>$ cd theflexibleada</code> %BR% <code>$ cmake CMakeLists.txt </code> 1 Copy executable and library: %BR% <code>$ cp lib/libtbb.dylib ~/local/lib</code> %BR% <code>$ cp build/* ~/local/bin </code> 1 Add these locations to your path with lines in ~/.profile: %BR% <code>export DYLD_LIBRARY_PATH="$DYLD_LIBRARY_PATH:$HOME/local/lib" %BR% export PATH="$PATH:$HOME/local/bin" </code> ---+++ Align reads to reference genome ---++++ Using bowtie2 First, index your genome so bowtie2 can map read to it: %BR% <code>$bowtie2-build REL606.fna REL606</code> %BR% Then, align each data set: %BR% <code>$bowtie2 -x REL606 -U datasetX.fastq --phred33 -S REL606.sam</code> %BR% Optionally, add the <code>--local</code> flag if your reads do not map end-to-end. ---++++ Using BWA First, index your genome so BWA can map read to it: %BR% <code>$bwa index REL606.fna</code> %BR% Then, align each data set: %BR% <code>$bwa aln REL606.fna dataset1.fastq > datasetX.sai </code> %BR% And convert to SAM format (assumes single-end data): <code>$bwa samse REL606.fna datasetX.sai datasetX.fastq > datasetX.sam </code> %BR% ---++ Count reads mapping to genes <code>breseq RNASEQ -f REL606.fna -r REL606.gbk -o datasetX.count.tab datasetX.sam</code> %BR% ---++ Convert alignments to BAM And convert to BAM format (assumes single-end data): %BR% <code>$samtools faidx REL606.fna </code> %BR% <code>$samtools import REL606.fna datasetX.sam datasetX.unsorted.bam </code> %BR% <code>$samtools sort datasetX.unsorted.bam datasetX </code> %BR% <code>$samtools index datasetX.bam </code> %BR% Now you can use IGV to view them. ---+++ Analyze differential gene expression library(DESEQ)
Attachments
Attachments
Topic attachments
I
Attachment
History
Action
Size
Date
Who
Comment
fasta
gsaf_illumina_adapters.fasta
r1
manage
0.2 K
2012-01-30 - 16:31
JeffreyBarrick
This topic: Lab
>
WebHome
>
ProtocolList
>
ProtocolsRNASeqDifferentialExpression
Topic revision: r5 - 2012-01-30 - JeffreyBarrick