Usage:
breseq -r reference.gbk reads1.fastq [reads2.fastq, reads3.fastq...]
Run the breseq pipeline for mutation prediction from genome re-sequencing data.
Required options:
Input reference genome sequence files in GenBank format. If there are multiple reference sequences stored in separate GenBank files (e.g., a bacterial genome and a plasmid), this option can be supplied multiple times.
The remaining arguments at the command line are the FASTQ input files of reads. The FASTQ base quality scores must be in SANGER format. If you get an error and need to convert your quality scores, see the fastq_utils command. breseq re-calibrates the error rates for each FASTQ file separately, so data sets that were generated independently should be stored in different input files.
Expert options:
Ignore bases with a quality score lower than this value when calling mutations. This accommodates Illumina formats that use quality scores of 2 to flag bad data. These bases are still used for aligning to the reference genome and are shown highlighted in yellow when drawing alignments, but they do not contribute to read alignment evidence. Default: 3
Identify and predict the frequencies of SNPs and small indels that are polymorphic (appear in only a subpopulation of reads). See Polymorphism Prediction for additional options and note that this option is still experimental.
Usage:
bam2aln [-b input.bam] [-f input.fasta] [-o output/path] region1 [region2 region3 ...]
Creates HTML pileup files displaying reads aligned to each specified region.
Options:
BAM database file of read alignments. Defaults: reference.bam, data/reference.bam.
FASTA file of reference sequences. Defaults: reference.fasta, data/reference.fasta.
Output path. If there are multiple regions, must be a directory path, and all output files will be output here with names region1.html, region2.html, ... If there is just one region, the output file will be given this name if it is not the name of an already existing directory. Default: current path.
Maximum number of reads that will be aligned to a region. If there are more than this many reads, then the reads displayed are randomly chosen and a warning is added to the output. Default: 1000.
Regions to create output for must be provided in the format FRAGMENT:START-END, where FRAGMENT is a valid identifier for one of the sequences in the FASTA file, and START and END are 1-indexed coordinates of the beginning and end positions. Any read overlapping these positions will be shown. A separate output file is created for each region.
Usage:
bam2cov -b input.bam -f input.fasta -o [output/path] region1 [region2, region3, ...]
Creates a coverage plot or table for the specified region.
Options:
BAM database file of read alignments. Defaults: reference.bam, data/reference.bam
FASTA file of reference sequences. Defaults: reference.fasta, data/reference.fasta
Output path. If there are multiple regions, must be a directory path, and all output files will be output here with names region1, region2, ... If there is one region, the output file will be given this name if it is not the name of an already existing directory. Default: current path.
Regions to create output for must be provided in the format FRAGMENT:START-END, where FRAGMENT is a valid identifier for one of the sequences in the FASTA file, and START and END are 1-indexed coordinates of the beginning and end of the region. A separate output file is created for each region.
In plot mode, create output plot in PDF format rather than PNG format.
In plot mode, maximum mumber of reference positions to plot coverage for within the region. Default: 600.
In plot mode, only output the total coverage of unique or repeat read mappings. (Does not break these down into the coverage on each strand of the reference sequence.)
Table mode. Rather than a plot, output a tab-delimited table of the coverage in the specified region to the output file. Also outputs the mean and standard error of the unique coverage within each region to STDOUT.
Usage:
fastq_utils COMMAND [arguments]
Performs various functions on FASTQ formatted files. Options depend on the COMMAND supplied. There are several different FASTQ styles with different base quality score formats.
Command: FORMAT
Usage:
fastq_utils FORMAT [-n 1000|ALL] input.fastq
Examine reads in a FASTQ file to predict its base quality score format.
Number of reads to examine when predicting the format. The keyword ‘ALL’ means examine every read in input the file.
FASTQ file to examine.
Command: SANGER
Usage:
fastq_utils SANGER -f from_format [-l] input.fastq output.fastq
Convert a FASTQ file to SANGER format.
Base quality score format of the input FASTQ file. Valid formats are: SANGER, SOLEXA, ILLUMINA_1.3+, ILLUMINA_1.5+. If you are unsure of the format, use the FORMAT command.
In the input FASTQ file, quality score lines are white space separated numbers, rather than character strings.
Input FASTQ file in specified format.
Output FASTQ file in SANGER format.
Usage:
genomediff COMMAND [arguments]
Performs various functions on genomediff formatted files. Options depend on the COMMAND supplied.
Command: COMPARE
Usage:
genomediff COMPARE -r reference.gbk input1.gd [input2.gd ...]
Create a table comparing mutations from different samples.
GenBank files for reference sequences. This option may be entered multiple times.
Output HTML file containing the comparison table. Default: “compare.html”.
Input genomediff files, one for each sample.