Analyzing RNA-Seq data for differential gene expression
Materials and Software
- Data files
- RNA-Seq data in FASTQ format (Ex: dataset1.fastq, dataset2.fastq)
- Genome sequence files
- Genome sequence in FASTA format (Ex: REL606.fna)
- Genome sequence gene annotations in GFF3 format (Ex: REL606.gff3)
- BWA read mapper software
- Download BWA
- To build, just type "make" in the source code directory.
- To install, move the executable "bwa" to somewhere in your $PATH, like $HOME/local/bin.
- For usage see the BWA manual.
- SAMtools
- R statistics package
- Bioconductor R modules
- library(edgeR)
- library(DESEQ)
- library(Rsamtools)
- HTSeq (Python package)
Commands
Align reads to reference genome
First, index your genome so BWA can map read to it:
$bwa index REL606.fna
Then, align each data set:
$bwa aln REL606.fna dataset1.fastq > datasetX.sai
And convert to SAM format (assumes single-end data):
$bwa samse REL606.fna datasetX.sai datasetX.fastq > datasetX.sam
And convert to BAM format (assumes single-end data):
$samtools faidx REL606.fna
$samtools import REL606.fna datasetX.sam datasetX.unsorted.bam
$samtools sort datasetX.unsorted.bam datasetX
$samtools index datasetX.bam
Analyze differential gene expression
library(DESEQ)