Running breseq on TACC

Installing breseq on stampede for mac

Open a new terminal window and use the following commands:

  1. ssh into stampede and set up folder structures and modules:
    • ssh <username>@stampede.tacc.utexas.edu
    • mkdir local
    • mkdir $WORK/src
    • module load R
    • module load perl
    • module load bowtie/2.2.5
    • module load launcher
    • module load autotools
    • module save
  2. Download the breseq repository from github and install it on stampede
  3. Use the "find" function in the terminal window (command f) to look for the word "check" searching from the bottom of the window to the top of the window. You should see multiple instances of "passed check" and a few other random "checks" thrown in. Stop searching once you start seeing left aligned "checking" from the previous command. Assuming you do not see any messages of "failed check" you have successfully installed breseq.
  4. Add breseq to your path
    • cdh
    • IF YOU DO NOT HAVE A .bashrc file, you must first run the following command:
      • /usr/local/startup_scripts/install_default_scripts
    • nano .bashrc
      • If you see a blank file, go back to the previous step and run the command to generate a .bashrc file
      • in section 2 add the following line exactly as it is written to modify your path to include breseq's installation location then save and exit the text editor:
        • export PATH=$HOME/local/bin:$PATH
    • logout
    • ssh <username>@stampede.tacc.utexas.edu
    • which breseq
If all has gone well, you should get "~/local/bin/breseq" as a response. If you don't, troubleshoot.

Updating breseq on stampede

Do this anytime you know there is a new version of breseq, anytime breseq is not working, or anytime you have not used breseq in a while.
  1. cd $WORK/src/breseq
  2. make clean
  3. git pull
  • If output sates that anything is changed do the following, otherwise skip to next section.
    1. ./bootstrap.sh
    2. ./configure --prefix=$HOME/local
    3. make
    4. make install
    5. make test
    6. Use the "find" function in the terminal window (command f) to look for the word "check" searching from the bottom of the window to the top of the window. You should see multiple instances of "passed check" and a few other random "checks" thrown in. Stop searching once you start seeing left aligned "checking" from the previous command. Assuming you do not see any messages of "failed check" you have successfully installed breseq.

Running _breseq

This section assumes you have next generation sequencing files in .fastq format and a good annotated reference file from a variety of different starting points.

Using .gd files

See this page for information on generating .gd files.

Without using .gd files

  1. Move to your scratch directory to make sure run can complete without space limits.
    • cd $SCRATCH
  2. Make a new project directory and change into that directory for better organization.
    • mkdir New_Project
    • cd New_Project
    • mkdir Log_Files
  3. Copy all sequencing files and reference files to this New_Project directory.
    • For mac the scp command is useful for transferring files from your local computer to stampede (i.e. fastq files and reference files); For windows users, winscp is 1 possible solution.
  4. Generate a new "commands" file with 1 line for each sample you wish to analyze.
    • nano commands
  5. Each line should look like the following with possible additional information added into the __ space:
    • breseq ___ -r Reference_file.gbk -o sample_output/Sample_Name read_1.fastq read_2.fastq >& Log_files/Sample_Name.log.txt
      • Reference_file.gbk should be replaced with the name of your reference file
      • Sample_Name should be replaced with a short name of the sample that makes sense to you and MUST NOT CONTAIN SPACES. It should be replaced both after the "/" in sample_output and after the "/" in Log_files. The .log.txt should be kept at the very end.
      • read_1.fastq should be replaced with the name of the first fastq file you received from the core (likely contains R1 as part of the name).
      • read_2.fastq should be replaced with the name of the second fastq file you received from the core if you did paired end sequencing (likely contains R2 as part of the name).
  6. The __ space can be filled with the following optional things:
    • "-j 16" can be added to make breseq run faster
    • "-p" should be added if your samples are mixed populations
    • other options can be assessed by typing breseq -h from the command line
  7. write and exit the nano editor:
    • ctrl-o enter ctrl-x
  8. copy a launcher.slurm file from TACC locations to current directory
    • cp $TACC_LAUNCHER_DIR/launcher.slurm .
  9. edit launcher.slurm file with nano as follows:
    • nano launcher.slurm
    • change the line that reads "#SBATCH -n 16" to be the number of samples you have, not 16
    • if you are using the "-j 16" command, change the line that reads "#SBATCH -N 1" to be the number of samples you have, otherwise, change it to be the number of samples you have divided by 16, with 1 added if there is a remainder.
      • examples if you are NOT using the -j 16 command
        • if you have 15 samples, use -N 1 and -n 15
        • if you have 16 samples, use -N 1 and -n 16
        • if you have 17 samples, use -N 2 and -n 17
    • change the line that reads "#SBATCH -t 01:00:00 " to be 24:00:00 unless you have a better estimate of how long your job will take to run (if using -j 16 12 hours is a better starting guess than 24)
    • below the time line, add 2 lines that look like the following:
      • #SBATCH --mail-user=your_email_adress@email.com
      • #SBATCH --mail-type=all
    • change the line that reads "setenv CONTROL_FILE paramlist" to read "setenv CONTROL_FILE commands"
  10. write and exit the nano editor:
    • ctrl-o enter ctrl-x
  11. copy your .fastq files, and reference files to your current directory
    • cp location/of/fastq/file/read_name.fastq .
    • cp location/of/reference/file/reference.gbk .
  12. be sure to repeat the above step for each fastq and reference file you wish to use, and remember you can make use of the the "*" wildcard to copy multiple files from the same location at once.
  13. submit your job to be run:
    • sbatch launcher.slurm
  14. you should get a series of information saying that your job has been submitted, and once your job starts running you will get emails saying that it has started/finished/errored

Visualizing breseq Output

This should be completed after you receive an email stating that your run completed successfully

  1. cd local/bin
  2. cp /corral-repl/utexas/BioITeam/bin/batch_run.pl .
  3. cp /corral-repl/utexas/BioITeam/bin/export-breseq.sh .
  4. cd $SCRATCH/New_Project/sample_output
    • replace New_Project with the name of the actual directory you created before the breseq run
  5. export-breseq.sh
    • this will take a few minutes to finish
  6. cd $SCRATCH/New_Project/sample_output
    • replace New_Project with the name of the actual directory you created before the breseq run
  7. ls will now show you a new folder called "05_Output_Export" and a new file called "05_Output_Export.tar.gz"
  8. pwd will now give you your current location
  9. From a new terminal window, navigate to the folder you want to store the breseq output before using the scp command:
    • scp username@stampede.tacc.utexas.edu:/location/from/other/window/05_Output_Export.tar.gz .
      • make sure there the above command has ":/" after.edu (this is the most common mistake with the scp command)
  10. Navigate to the downloaded file location in the finder and extract the files by double clicking them.
  11. open the "index.html" file


This topic: Lab > WebHome > ProtocolList > ProtocolsRunningBreseqOnTACC
Topic revision: r6 - 2016-07-05 - DanielDeatherage