Difference: ProtocolsGdGenerationl (2 vs. 3)

Revision 32016-03-30 - DanielDeatherage

 
META TOPICPARENT name="ComputationList"

Genome Diff file Generation

Overview

This is a series of commands to automatically generate .gd files based on naming system present in .fastq files. This will typically be the first step once you have sequencing files in .fastq format from the sequencing core. Script exists as part of barricklab respository on github.

Protocol

"" marks are required, text within <> should be replaced.

Clone or update script from github repository:

    • Clone (for first time users):
      1. cd <location_for_repository>
      2. git clone https://github.com/barricklab/barricklab.git
      3. Do 1 of the following:
        • add <location_for_repository/barricklab> to your path (don't forget to put it in your profile)
        • run the command using the absolute path
Changed:
<
<
        • copy reads_to_gd_files.py to place within your path. This is the worst choice. NOTE: you will need to recopy the script each time you update from the repository
>
>
        • move reads_to_gd_files.py to place within your path. This is the worst choice. NOTE: you will need to move the script each time you update from the repository
 
    • update respository scripts (if you know there are changes or if something isn't working how you expect)
      1. cd <location_for_repository/barricklab>
      2. git pull
Changed:
<
<
      1. Remember to copy reads_to_gd_files.py to somewhere in your path if you went with that option originally, and consider deleting it from that location and doing one of the other options.
>
>
      1. Remember to move reads_to_gd_files.py to somewhere in your path if you went with that option originally, and consider deleting it from that location and doing one of the other options.
Added:
>
>
 

Generate meta data file

Changed:
<
<
Make a new .tsv file using your favorite text editor which conatains your sample name (which should match the begining of the .fastq files) and any of the following: population', 'time' (meaning generation), 'treatment', 'clone'. Be sure the first row contains the names you want included in the .gd files.
>
>
  • Make a new .tsv file using your favorite text editor which contains your sample name (which should match the beginning of the .fastq files) and any of the following: 'population', 'time' (meaning generation), 'treatment', 'clone'.
Added:
>
>
  • Be sure the first row contains the names you want included in the .gd files. (i.e. that the file has a header row)
  • Be sure 'sample' is the first column
  • Be sure file is tab separated
 

Generate .gd files

    1. cd <location_of_fastq_files>
    2. Run the following command and applicable options:
      • reads_to_gd_files.py -a <your_name> -f $PWD -r <location_of_references> -m <location_and_name_of_meta_data_tsv_file>
Changed:
<
<
      • for <your_name> make sure you do not use spaces
      • for <location_of_reference_file> make sure it is specified from root (i.e. should start with "/")
>
>
      • for <your_name> make sure you do not use spaces
      • for <location_of_reference_file> AND <location_and_name_of_meta_data_tsv_file> make sure it is specified from root (i.e. should start with "/")
 
      • If there are index files in the directory (i.e. if your sequencing data comes from a miSeq run) add -i to the above command line
      • If the data is stored on TACC's corral system and is maintained by the Barrick lab, add -b to the above command line
      • If the data is stored on NCBI's SRA archive, add -s to the above command line
Added:
>
>
    1. Inspect contents of new_gd_files, and make sure both the files created and the contents of those files look correct.
 

Add .gd files to DCAMP

Changed:
<
<
    1. checkout newest version of DCAMP from github
>
>
    1. checkout newest version of DCAMP from repository. Alternatively, if you have not previously cloned the dcamp repository, see how to clone dcamp
 
      • cd <location_of_dcamp>
Changed:
<
<
      • git pull
>
>
      • hg pull
Added:
>
>
      • hg update
 
    1. If fastq files are part of a new project/publication, make a new RS_#### directory within DCAMP's src/data directory, otherwise determine what directory .gd files should belong to.
    2. cd <location_of_fastq_files>
    3. mv -i new_gd_files/*.gd <location_from_root_of_dcamp>/dcamp/src/data/<RS_####_directory>
Added:
>
>
    1. verify that the contents of new_gd_files are empty
      • ls new_gd_files
 
    1. rm -r new_gd_files
    2. cd <location_from_root_of_dcamp>/dcamp
Changed:
<
<
    1. git add *
    2. git commit -m "<Brief description of what you are adding>"
    3. git push
>
>
    1. hg add .
    2. hg commit -m "<Brief description of what you are adding>"
    3. hg push
 

Version History.

Section contains information on versions of scripts, dates scripts were used, and archived versions. Care will be taken that new scripts preserve the same command line execution and only add new functionality. Direct suggestions for improvements to Dan.
Added:
>
>
  • 3-30-16 GSAF Core seems to have changed the fastq file output naming system. No longer is the barcode added prior to the lane ID. Instead an unknown S# is added in front of it. S# varies per sample and does not appear to be related to the barcode. Script updated to account for new naming, but should retain old functionality as well
 

-- Main.DanielDeatherage - 29 Mar 2016

 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright ©2025 Barrick Lab contributing authors. Ideas, requests, problems? Send feedback