Usage

breseq

Usage:

breseq -r reference.gbk reads1.fastq [reads2.fastq, reads3.fastq...]

Run the breseq pipeline for mutation prediction from genome re-sequencing data.

Required options:

-r <file_path>, --reference <file_path>

Input reference genome sequence files in GenBank format. If there are multiple reference sequences stored in separate GenBank files (e.g., a bacterial genome and a plasmid), this option can be supplied multiple times.

reads1.fastq [reads2.fastq, reads3.fastq...]

The remaining arguments at the command line are the FASTQ input files of reads. The FASTQ base quality scores must be in SANGER format. If you get an error and need to convert your quality scores, see the fastq_utils command. breseq re-calibrates the error rates for each FASTQ file separately, so data sets that were generated independently should be stored in different input files.

Expert options:

--base-quality-cutoff=<int>

Ignore bases with a quality score lower than this value when calling mutations. This accommodates Illumina formats that use quality scores of 2 to flag bad data. These bases are still used for aligning to the reference genome and are shown highlighted in yellow when drawing alignments, but they do not contribute to read alignment evidence. Default: 3

--predict-polymorphisms

Identify and predict the frequencies of SNPs and small indels that are polymorphic (appear in only a subpopulation of reads). See Polymorphism Prediction for additional options and note that this option is still experimental.

bam2aln

Usage:

bam2aln [-b input.bam] [-f input.fasta] [-o output/path] region1 [region2 region3 ...]

Creates HTML pileup files displaying reads aligned to each specified region.

Options:

-b <file_path>, --bam=<file_path>

BAM database file of read alignments. Defaults: reference.bam, data/reference.bam.

-f <file_path>, --fasta=<file_path>

FASTA file of reference sequences. Defaults: reference.fasta, data/reference.fasta.

-o <path>, --output=<path>

Output path. If there are multiple regions, must be a directory path, and all output files will be output here with names region1.html, region2.html, ... If there is just one region, the output file will be given this name if it is not the name of an already existing directory. Default: current path.

-n <int>, --max-reads=<int>

Maximum number of reads that will be aligned to a region. If there are more than this many reads, then the reads displayed are randomly chosen and a warning is added to the output. Default: 1000.

region1 [region2, region3, ...]

Regions to create output for must be provided in the format FRAGMENT:START-END, where FRAGMENT is a valid identifier for one of the sequences in the FASTA file, and START and END are 1-indexed coordinates of the beginning and end positions. Any read overlapping these positions will be shown. A separate output file is created for each region.

bam2cov

Usage:

bam2cov -b input.bam -f input.fasta -o [output/path] region1 [region2, region3, ...]

Creates a coverage plot or table for the specified region.

Options:

-b <file_path>, --fasta <file_path>

BAM database file of read alignments. Defaults: reference.bam, data/reference.bam

-f <file_path>, --fasta <file_path>

FASTA file of reference sequences. Defaults: reference.fasta, data/reference.fasta

-o <path>, --output <path>

Output path. If there are multiple regions, must be a directory path, and all output files will be output here with names region1, region2, ... If there is one region, the output file will be given this name if it is not the name of an already existing directory. Default: current path.

region1 [region2, region3, ...]

Regions to create output for must be provided in the format FRAGMENT:START-END, where FRAGMENT is a valid identifier for one of the sequences in the FASTA file, and START and END are 1-indexed coordinates of the beginning and end of the region. A separate output file is created for each region.

--pdf

In plot mode, create output plot in PDF format rather than PNG format.

-r <int>, --resolution <int>

In plot mode, maximum mumber of reference positions to plot coverage for within the region. Default: 600.

-1, --total_only

In plot mode, only output the total coverage of unique or repeat read mappings. (Does not break these down into the coverage on each strand of the reference sequence.)

-t, --table

Table mode. Rather than a plot, output a tab-delimited table of the coverage in the specified region to the output file. Also outputs the mean and standard error of the unique coverage within each region to STDOUT.

fastq_utils

Usage:

fastq_utils COMMAND [arguments]

Performs various functions on FASTQ formatted files. Options depend on the COMMAND supplied. There are several different FASTQ styles with different base quality score formats.

Command: FORMAT

Usage:

fastq_utils FORMAT [-n 1000|ALL] input.fastq

Examine reads in a FASTQ file to predict its base quality score format.

-n <int>, -n ALL, --num=<int>, --num=ALL

Number of reads to examine when predicting the format. The keyword ‘ALL’ means examine every read in input the file.

input.fastq

FASTQ file to examine.

Command: SANGER

Usage:

fastq_utils SANGER -f from_format [-l] input.fastq output.fastq

Convert a FASTQ file to SANGER format.

-f <format>, --format=<format>

Base quality score format of the input FASTQ file. Valid formats are: SANGER, SOLEXA, ILLUMINA_1.3+, ILLUMINA_1.5+. If you are unsure of the format, use the FORMAT command.

-l, --list-format

In the input FASTQ file, quality score lines are white space separated numbers, rather than character strings.

input.fastq

Input FASTQ file in specified format.

output.fastq

Output FASTQ file in SANGER format.

genomediff

Usage:

genomediff COMMAND [arguments]

Performs various functions on genomediff formatted files. Options depend on the COMMAND supplied.

Command: COMPARE

Usage:

genomediff COMPARE -r reference.gbk input1.gd [input2.gd ...]

Create a table comparing mutations from different samples.

-r <file_path>, --reference=<file_path>

GenBank files for reference sequences. This option may be entered multiple times.

-o <file_path>, --output=<file_path>

Output HTML file containing the comparison table. Default: “compare.html”.

<input1.gd [input2.gd ...]>

Input genomediff files, one for each sample.

Table Of Contents

Previous topic

Installation

Next topic

Methods

This Page