Analyzing RNA-Seq data for differential gene expression
Materials and Software
- Data files
- RNA-Seq data in FASTQ format (Ex: dataset1.fastq, dataset2.fastq)
- Genome sequence files
- Genome sequence in FASTA format (Ex: REL606.fna)
- Genome sequence gene annotations in GFF3 format (Ex: REL606.gff3)
- Adaptor filtering software
- Read mapping software
- Bowtie2
- Download Bowtie2
- Download an executable for your platform
- Add this directory to your $PATH or move bowtie, and bowtie-* star executable to your $PATH
- breseq and included gdtools
- htseq python module and scripts
- R statistics package
- Bioconductor R modules
- library(edgeR)
- library(deseq2) *
brnaseq
script
- Download from Github.
- Note: You only need this one script and don't need to download the entire
barricklab
code repository.
Commands
Create genomediff
metadata files
Remove adaptor sequences from reads
You will need a FASTA file of adaptor sequences.
For each input file you will need to run this command (single-end data):
$far --source datasetX.fastq --target datasetX.noadaptor --adaptive-overlap --trim-end any --adapters gsaf_illumina_adapters.fasta --format fastq-sanger
There is an option to process paired-end data like this:
$far --source datasetX_R1.fastq --source2 datasetX_R2.fastq --target datasetX.noadaptor --adaptive-overlap --trim-end any --adapters gsaf_illumina_adapters.fasta --format fastq-sanger
Align reads to reference genome
Using bowtie2
First, index your genome so bowtie2 can map read to it:
$bowtie2-build REL606.fna REL606
Then, align each data set:
$bowtie2 -x REL606 -U datasetX.fastq --phred33 -S REL606.sam
Optionally, add the
--local
flag if your reads do not map end-to-end.
Using BWA
First, index your genome so BWA can map read to it:
$bwa index REL606.fna
Then, align each data set:
$bwa aln REL606.fna dataset1.fastq > datasetX.sai
And convert to SAM format (assumes single-end data):
$bwa samse REL606.fna datasetX.sai datasetX.fastq > datasetX.sam
Count reads mapping to genes
breseq RNASEQ -f REL606.fna -r REL606.gbk -o datasetX.count.tab datasetX.sam
Analyze differential gene expression
Using DESeq
Optional: View reads in IGV
And convert to BAM format (assumes single-end data):
$samtools faidx REL606.fna
$samtools import REL606.fna datasetX.sam datasetX.unsorted.bam
$samtools sort datasetX.unsorted.bam datasetX
$samtools index datasetX.bam
Now you can use IGV to view them.