Analyzing RNA-Seq data for differential gene expression

Materials and Software

  • Data files
    • RNA-Seq data in FASTQ format (Ex: dataset1.fastq, dataset2.fastq)
  • Genome sequence files
    • Genome sequence in FASTA format (Ex: REL606.fna)
    • Genome sequence gene annotations in GFF3 format (Ex: REL606.gff3)
  • BWA read mapper software
    • Download BWA
    • To build, just type "make" in the source code directory.
    • To install, move the executable "bwa" to somewhere in your $PATH, like $HOME/local/bin.
    • For usage see the BWA manual.
  • SAMtools
  • R statistics package
  • Bioconductor R modules
    • library(edgeR)
    • library(DESEQ)
    • library(Rsamtools)
  • HTSeq (Python package)


Align reads to reference genome

First, index your genome so BWA can map read to it:
$bwa index REL606.fna

Then, align each data set:
$bwa aln REL606.fna dataset1.fastq > datasetX.sai

And convert to SAM format (assumes single-end data): $bwa samse REL606.fna datasetX.sai datasetX.fastq > datasetX.sam

And convert to BAM format (assumes single-end data):
$samtools faidx REL606.fna
$samtools import REL606.fna datasetX.sam datasetX.unsorted.bam
$samtools sort datasetX.unsorted.bam datasetX
$samtools index datasetX.bam

Analyze differential gene expression


Edit | Attach | Print version | History: r6 | r4 < r3 < r2 < r1 | Backlinks | Raw View | Raw edit | More topic actions...

 Barrick Lab  >  ProtocolList  >  ProtocolsRNASeqDifferentialExpression

Contributors to this topic edittopic JeffreyBarrick
Topic revision: r2 - 28 Jan 2012 - 23:07:24 - Main.JeffreyBarrick
This site is powered by the TWiki collaboration platformCopyright ©2020 Barrick Lab contributing authors. Ideas, requests, problems? Send feedback