Using Flexbar program to remove adapter sequences from NGS reads

Installing Flexbar (notes specific to TACC, can be updated for other systems)

Go to Flexbar home page select the newest version (2.31 as of 1-16-13).
Right click *_linux64.tgz and select 'copy link location'.
Log onto TACC
cd $WORK/src
wget "paste link location"
- wget http://sourceforge.net/projects/flexbar/files/2.31/flexbar_v2.31_linux64.tgz/download
tar xvzf flexbar*.tgz
cd "new folder"
- cd flexbar_v2.31_linux64
cp flexbar $HOME/local/bin
vi $HOME/.profile_user
- Add the following if not already present:
  1. export PATH=$HOME/local/bin:$PATH
  2. export LD_LIBRARY_PATH=$WORK/src/flexbar_v2.31_linux64:$LD_LIBRARY_PATH
    - optionally, can move flexbar to any location in your path, and can move libtbb.so.2 to any location in LD_LIBRARY_PATH
logout
Log back onto TACC
flexbar -h
If the help manual appears flexbar should be ready to use. If you get an error message see below, and check that $PATH and $LD_LIBRARY_PATH include the locations of the relevant files.

If you try installing from source, you may need to switch to the gcc compiler (module swap intel gcc)

Command line usage for removal of adapter sequences

Generic command for performing maximal (aggressive) trimming. Replace everything between "" with appropriate names, and delete the "" marks:

flexbar -t "New_file_name" -r "read_1_file_name" -p "read_2_file_name" -f fastq -a "fasta_file_of_adapter_sequences" -ao 1

Example command:

flexbar -t DED81 -r 02_Downloads/Sample_DED81_L004_R1.cat.fastq -p 02_Downloads/Sample_DED81_L004_R2.cat.fastq -f fastq -a 02_trimmed_Downloads/adapter_seq.fasta -u 101 -ao 1

For a less aggressive command, remove -ao 1.

Choice of adaptor sequence file

For most data sets analyzed in the lab, the Illumina Truseq adaptors file is the correct one to use (attached file: illumina_truseq.fasta).

Flag explanations

Flag	Text to follow	What flag means	Reason
-t	New_file_name	Name of output file.	Dictate what your output file is to be named. Suggest something different than input to avoid overwriting untrimmed.
-r	R1_source_file_name	Name of Read1 sequencing file.	File to remove adapters from.
-p	R2_source_file_name	Name of Read2 sequencing file.(Optional: can do each file separately).	File to remove adapters from.
-f	Format	Format of reads.	Most commonly will be fasta or fastq.
-a	Adapter_sequence_file.fasta	Fasta file with full adapter sequences, degenerate bases allowed.	What sequence is to be removed.
-ao	Number	Number of bases of overlap between read and adapter	This number equals the minimum number of bp to be removed.
-u	Number	Number of N's allowed in final sequence. By default 0 Ns allowed. Breseq handles Ns therefore reads should/can be retained.
-at	Number	Number of mismatches and indels per 10bp of adapter sequence allowed	This accounts for sequencing/PCR errors changing adapter sequence. Default = 3, increasing this number increases false positive rate, and decreases false negative rate.

Additional Information

For additional help and options, type flexbar -h

Attachments

Topic attachments
I	Attachment	History	Action	Size	Date	Who	Comment
fasta	gsaf_illumina_adapters.fasta	r2 r1	manage	0.4 K	2013-03-11 - 17:29	JeffreyBarrick
fasta	illumina_nextera.fasta	r1	manage	0.2 K	2014-04-25 - 03:45	JeffreyBarrick
fasta	illumina_truseq.fasta	r1	manage	0.1 K	2014-04-18 - 17:22	JeffreyBarrick
fasta	new_adaptors.fasta	r1	manage	0.3 K	2013-05-21 - 16:47	DanielDeatherage	fasta_file_of_adapter_sequences 8bp barcode sequences

Barrick Lab > ProtocolList > ProtocolsFlexbarCommands

Contributors to this topic

DanielDeatherage, JeffreyBarrick

Topic revision: r10 - 2014-04-25 - 03:45:32 - Main.JeffreyBarrick