Genome Diff file Generation

Overview

This is a series of commands to automatically generate .gd files based on naming system present in .fastq files. This will typically be the first step once you have sequencing files in .fastq format from the sequencing core. Script exists as part of barricklab respository on github.

Protocol

"" marks are required, text within <> should be replaced.

Clone or update script from github repository:

- Clone (for first time users):
  1. cd <location_for_repository>
  2. git clone https://github.com/barricklab/barricklab.git
  3. Do 1 of the following:
    - add <location_for_repository/barricklab> to your path (don't forget to put it in your profile)
    - run the command using the absolute path
    - copy reads_to_gd_files.py to place within your path. This is the worst choice. NOTE: you will need to recopy the script each time you update from the repository
- update respository scripts (if you know there are changes or if something isn't working how you expect)
  1. cd <location_for_repository/barricklab>
  2. git pull
  3. Remember to copy reads_to_gd_files.py to somewhere in your path if you went with that option originally, and consider deleting it from that location and doing one of the other options.

Generate meta data file

Make a new .tsv file using your favorite text editor which conatains your sample name (which should match the begining of the .fastq files) and any of the following: population', 'time' (meaning generation), 'treatment', 'clone'. Be sure the first row contains the names you want included in the .gd files.

Generate .gd files

1. cd <location_of_fastq_files>
2. Run the following command and applicable options:
  - reads_to_gd_files.py -a <your_name> -f $PWD -r <location_of_references> -m <location_and_name_of_meta_data_tsv_file>
  - for <your_name> make sure you do not use spaces
  - for <location_of_reference_file> make sure it is specified from root (i.e. should start with "/")
  - If there are index files in the directory (i.e. if your sequencing data comes from a miSeq run) add -i to the above command line
  - If the data is stored on TACC's corral system and is maintained by the Barrick lab, add -b to the above command line
  - If the data is stored on NCBI's SRA archive, add -s to the above command line

Add .gd files to DCAMP

1. checkout newest version of DCAMP from github
  - cd <location_of_dcamp>
  - git pull
2. If fastq files are part of a new project/publication, make a new RS_#### directory within DCAMP's src/data directory, otherwise determine what directory .gd files should belong to.
3. cd <location_of_fastq_files>
4. mv -i new_gd_files/*.gd <location_from_root_of_dcamp>/dcamp/src/data/<RS_####_directory>
5. rm -r new_gd_files
6. cd <location_from_root_of_dcamp>/dcamp
7. git add *
8. git commit -m "<Brief description of what you are adding>"
9. git push

Version History.

Section contains information on versions of scripts, dates scripts were used, and archived versions. Care will be taken that new scripts preserve the same command line execution and only add new functionality. Direct suggestions for improvements to Dan.

-- Main.DanielDeatherage - 29 Mar 2016

Barrick Lab > ComputationList > ProtocolsGdGenerationl

Contributors to this topic

DanielDeatherage

Topic revision: r2 - 2016-03-29 - 22:24:07 - Main.DanielDeatherage