Barrick Lab :: ProtocolsRunningBreseqOnTACC

---+ Running _breseq_ on TACC
---++ Installing _breseq_ on stampede for mac

Open a new terminal window and use the following commands:

   1 ssh into stampede and set up folder structures and modules:
      * =ssh &lt;username&gt;@stampede.tacc.utexas.edu=
      * =mkdir local=
      * =mkdir $WORK/src=
      * =module load R=
      * =module load perl=
      * =module load bowtie/2.2.5=
      * =module load launcher=
      * =module load autotools=
      * =module save=
   1 Download the breseq repository from github and install it on stampede
      * =cd $WORK/src=
      * =git clone= https://github.com/barricklab/breseq.git
      * =cd breseq=
      * =./bootstrap.sh=
      * =./configure --prefix=$HOME/local=
      * =make=
      * =make install=
      * =make test=
   1 Use the "find" function in the terminal window (command f) to look for the word "check" searching from the bottom of the window to the top of the window. You should see multiple instances of "passed check" and a few other random "checks" thrown in. Stop searching once you start seeing left aligned "checking" from the previous command. Assuming you do not see any messages of "failed check" you have successfully installed breseq.
   1 Add breseq to your path
      * =cdh=
      * *IF YOU DO NOT HAVE A .bashrc file, you must first run the following command:*
         * =/usr/local/startup_scripts/install_default_scripts=
      * =nano .bashrc=
         * If you see a blank file, go back to the previous step and run the command to generate a .bashrc file
         * in section 2 add the following line exactly as it is written to modify your path to include breseq's installation location then save and exit the text editor:
            * =export PATH=$HOME/local/bin:$PATH=            
      * =logout=
      * =ssh &lt;username&gt;@stampede.tacc.utexas.edu=
      * =which breseq=
If all has gone well, you should get "~/local/bin/breseq" as a response. If you don't, troubleshoot.

---++ Updating _breseq_ on stampede 
Do this anytime you know there is a new version of _breseq_, anytime _breseq_ is not working, or anytime you have not used _breseq_ in a while.
   1 =cd $WORK/src/breseq=
   1 =make clean=
   1 =git pull=
   * *If output sates that anything is changed do the following, otherwise skip to next section.*
      1 =./bootstrap.sh=
      1 =./configure --prefix=$HOME/local=
      1 =make=
      1 =make install=
      1 =make test=
      1 Use the "find" function in the terminal window (command f) to look for the word "check" searching from the bottom of the window to the top of the window. You should see multiple instances of "passed check" and a few other random "checks" thrown in. Stop searching once you start seeing left aligned "checking" from the previous command. Assuming you do not see any messages of "failed check" you have successfully installed breseq.

---++ Running _breseq_ 
This section assumes you have next generation sequencing files in *.fastq* format and a good annotated reference file from a variety of different starting points.

---+++ Using .gd files
See [[http://barricklab.org/twiki/bin/view/Lab/ProtocolsGdGenerationl][this page]] for information on generating .gd files.

---+++ Without using .gd files
   1 Move to your scratch directory to make sure run can complete without space limits.
      * =cd $SCRATCH=
   1 Make a new project directory and change into that directory for better organization.
      * =mkdir New_Project=
      * =cd New_Project=
      * =mkdir Log_Files=
   1 Copy all sequencing files and reference files to this New_Project directory.
      * For mac the scp command is useful for transferring files from your local computer to stampede (i.e. fastq files and reference files); For windows users, winscp is 1 possible solution.
   1 Generate a new "commands" file with 1 line for each sample you wish to analyze.
      * =nano commands=
   1 Each line should look like the following with possible additional information added into the ____ space:
      * =breseq _______ -r Reference_file.gbk -o sample_output/Sample_Name read_1.fastq read_2.fastq >& Log_files/Sample_Name.log.txt=
         * Reference_file.gbk should be replaced with the name of your reference file
         * Sample_Name should be replaced with a short name of the sample that makes sense to you and *MUST NOT CONTAIN SPACES*. It should be replaced both after the "/" in sample_output and after the "/" in Log_files. The .log.txt should be kept at the very end.
         * read_1.fastq should be replaced with the name of the first fastq file you received from the core (likely contains _R1_ as part of the name).
         * read_2.fastq should be replaced with the name of the second fastq file you received from the core if you did paired end sequencing (likely contains _R2_ as part of the name).
   1 The ____ space can be filled with the following optional things:
      * "-j 16" can be added to make breseq run faster
      * "-p" should be added if your samples are mixed populations
      * other options can be assessed by typing breseq -h from the command line
   1 write and exit the nano editor:
      * =ctrl-o enter ctrl-x=
   1 copy a launcher.slurm file from TACC locations to current directory
      * =cp $TACC_LAUNCHER_DIR/launcher.slurm .=
   1 edit launcher.slurm file with nano as follows:
      * =nano launcher.slurm=
      * change the line that reads "#SBATCH -n 16" to be the number of samples you have, not 16
      * if you are using the "-j 16" command, change the line that reads "#SBATCH -N 1" to be the number of samples you have, otherwise, change it to be the number of samples you have divided by 16, with 1 added if there is a remainder.
         * examples if you are NOT using the -j 16 command
            * if you have 15 samples, use -N 1 and -n 15
            * if you have 16 samples, use -N 1 and -n 16
            * if you have 17 samples, use -N 2 and -n 17
      * change the line that reads "#SBATCH -t 01:00:00 " to be 24:00:00 unless you have a better estimate of how long your job will take to run (if using -j 16 12 hours is a better starting guess than 24)
      * below the time line, add 2 lines that look like the following:
         * #SBATCH --mail-user=your_email_adress@email.com
         * #SBATCH --mail-type=all
      * change the line that reads "setenv CONTROL_FILE   paramlist" to read "setenv CONTROL_FILE   commands"
   1 write and exit the nano editor:
      * =ctrl-o enter ctrl-x=
   1 copy your .fastq files, and reference files to your current directory
      * =cp location/of/fastq/file/read_name.fastq .=
      * =cp location/of/reference/file/reference.gbk .=
   1 be sure to repeat the above step for each fastq and reference file you wish to use, and remember you can make use of the the "*" wildcard to copy multiple files from the same location at once.
   1 submit your job to be run:
      * =sbatch launcher.slurm=
   1 you should get a series of information saying that your job has been submitted, and once your job starts running you will get emails saying that it has started/finished/errored

---++ Visualizing _breseq_ Output
This should be completed after you receive an email stating that your run completed successfully
   
   1 =cd local/bin=
   1 =cp /corral-repl/utexas/BioITeam/bin/batch_run.pl .=
   1 =cp /corral-repl/utexas/BioITeam/bin/export-breseq.sh .=
   1 =cd $SCRATCH/New_Project/sample_output=
      * replace New_Project with the name of the actual directory you created before the breseq run
   1 =export-breseq.sh=
      * this will take a few minutes to finish
   1 =cd $SCRATCH/New_Project/sample_output=
      * replace New_Project with the name of the actual directory you created before the breseq run
   1 =ls= will now show you a new folder called "05_Output_Export" and a new file called "05_Output_Export.tar.gz"
   1 =pwd= will now give you your current location
   1 From a new terminal window, navigate to the folder you want to store the breseq output before using the scp command:
      * =scp username@stampede.tacc.utexas.edu:/location/from/other/window/05_Output_Export.tar.gz .=
         * make sure there the above command has ":/" after.edu (this is the most common mistake with the scp command)
   1 Navigate to the downloaded file location in the finder and extract the files by double clicking them.
   1 open the "index.html" file
Barrick Lab > ProtocolList > ProtocolsRunningBreseqOnTACC
Contributors to this topic
DanielDeatherage, JeffreyBarrick
Topic revision: r6 - 2016-07-05 - 13:50:10 - Main.DanielDeatherage