Analyzing RNA-Seq data for differential gene expression

Materials and Software

  • Data files
    • RNA-Seq data in FASTQ format (Ex: dataset1.fastq, dataset2.fastq)
  • Genome sequence files
    • Genome sequence in FASTA format (Ex: REL606.fna)
    • Genome sequence gene annotations in GFF3 format (Ex: REL606.gff3)
  • Adaptor filtering software
  • Read mapping software
    • Bowtie2
      • Download Bowtie2
      • Download an executable for your platform
      • Add this directory to your $PATH or move bowtie, and bowtie-* star executable to your $PATH
  • breseq and included gdtools
  • htseq python module and scripts
    • Install using pip
  • R statistics package
  • Bioconductor R modules
    • library(edgeR)
    • library(deseq2) * brnaseq script
    • Download from Github.
    • Note: You only need this one script and don't need to download the entire barricklab code repository.

Commands

Create genomediff metadata files

Remove adaptor sequences from reads

You will need a FASTA file of adaptor sequences.

For each input file you will need to run this command (single-end data):
$far --source datasetX.fastq --target datasetX.noadaptor --adaptive-overlap --trim-end any --adapters gsaf_illumina_adapters.fasta --format fastq-sanger

There is an option to process paired-end data like this:
$far --source datasetX_R1.fastq --source2 datasetX_R2.fastq --target datasetX.noadaptor --adaptive-overlap --trim-end any --adapters gsaf_illumina_adapters.fasta --format fastq-sanger

Align reads to reference genome

Using bowtie2

First, index your genome so bowtie2 can map read to it:
$bowtie2-build REL606.fna REL606

Then, align each data set:
$bowtie2 -x REL606 -U datasetX.fastq --phred33 -S REL606.sam

Optionally, add the --local flag if your reads do not map end-to-end.

Using BWA

First, index your genome so BWA can map read to it:
$bwa index REL606.fna

Then, align each data set:
$bwa aln REL606.fna dataset1.fastq > datasetX.sai

And convert to SAM format (assumes single-end data): $bwa samse REL606.fna datasetX.sai datasetX.fastq > datasetX.sam

Count reads mapping to genes

breseq RNASEQ -f REL606.fna -r REL606.gbk -o datasetX.count.tab datasetX.sam

Analyze differential gene expression

Using DESeq

Optional: View reads in IGV

And convert to BAM format (assumes single-end data):
$samtools faidx REL606.fna
$samtools import REL606.fna datasetX.sam datasetX.unsorted.bam
$samtools sort datasetX.unsorted.bam datasetX
$samtools index datasetX.bam

Now you can use IGV to view them.

Topic attachments
I Attachment History Action Size Date Who Comment
Unknown file formatfasta gsaf_illumina_adapters.fasta r1 manage 0.2 K 2012-01-30 - 16:31 JeffreyBarrick  
Edit | Attach | Watch | Print version | History: r8 < r7 < r6 < r5 < r4 | Backlinks | Raw View | More topic actions...

 Barrick Lab  >  ProtocolList  >  ProtocolsRNASeqDifferentialExpression

Contributors to this topic Edit topic JeffreyBarrick
Topic revision: r7 - 2020-04-08 - 11:28:42 - Main.JeffreyBarrick
 
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright ©2024 Barrick Lab contributing authors. Ideas, requests, problems? Send feedback