Difference: ProtocolsGdGenerationl (2 vs. 3)

Revision 32016-03-30 - DanielDeatherage

  META TOPICPARENT 
 name="ComputationList" 

 Genome Diff file Generation 
 Overview 

This is a series of commands to automatically generate .gd files based on naming system present in .fastq files. This will typically be the first step once you have sequencing files in .fastq format from the sequencing core. Script exists as part of barricklab respository on github.

 Protocol 
"" marks are required, text within <> should be replaced.
 Clone or update script from github repository: 
 
 
 
 Clone (for first time users): 
 cd <location_for_repository>
  git clone https://github.com/barricklab/barricklab.git
  Do 1 of the following:  
 add <location_for_repository/barricklab> to your path (don't forget to put it in your profile)
  run the command using the absolute path
- META TOPICPARENT
+ name="ComputationList"
-<
<
+ copy reads_to_gd_files.py to place within your path. This is the worst choice. NOTE: you will need to recopy the script each time you update from the repository
->
>
+ move reads_to_gd_files.py to place within your path. This is the worst choice. NOTE: you will need to move the script each time you update from the repository
  update respository scripts (if you know there are changes or if something isn't working how you expect) 
 cd <location_for_repository/barricklab>
  git pull
-<
<
+ Remember to copy reads_to_gd_files.py to somewhere in your path if you went with that option originally, and consider deleting it from that location and doing one of the other options.
->
>
+ Remember to move reads_to_gd_files.py to somewhere in your path if you went with that option originally, and consider deleting it from that location and doing one of the other options.
->
>
  Generate meta data file
-<
<
+Make a new .tsv file using your favorite text editor which conatains your sample name (which should match the begining of the .fastq files) and any of the following: population', 'time' (meaning generation), 'treatment', 'clone'. Be sure the first row contains the names you want included in the .gd files.
->
>
+ Make a new .tsv file using your favorite text editor which contains your sample name (which should match the beginning of the .fastq files) and any of the following: 'population', 'time' (meaning generation), 'treatment', 'clone'.
->
>
+ Be sure the first row contains the names you want included in the .gd files. (i.e. that the file has a header row)
  Be sure 'sample' is the first column
  Be sure file is tab separated
  Generate .gd files 
 
 
 
 cd <location_of_fastq_files>
  Run the following command and applicable options: 
 reads_to_gd_files.py -a <your_name> -f $PWD -r <location_of_references> -m <location_and_name_of_meta_data_tsv_file>
-<
<
+ for <your_name> make sure you do not use spaces
  for <location_of_reference_file> make sure it is specified from root (i.e. should start with "/")
->
>
+ for <your_name> make sure you do not use spaces
  for <location_of_reference_file> AND <location_and_name_of_meta_data_tsv_file> make sure it is specified from root (i.e. should start with "/")
  If there are index files in the directory (i.e. if your sequencing data comes from a miSeq run) add -i to the above command line
  If the data is stored on TACC's corral system and is maintained by the Barrick lab, add -b to the above command line
  If the data is stored on NCBI's SRA archive, add -s to the above command line
->
>
+ Inspect contents of new_gd_files, and make sure both the files created and the contents of those files look correct.
  Add .gd files to DCAMP
-<
<
+ checkout newest version of DCAMP from github
->
>
+ checkout newest version of DCAMP from repository. Alternatively, if you have not previously cloned the dcamp repository, see how to clone dcamp
  cd <location_of_dcamp>
-<
<
+ git pull
->
>
+ hg pull
->
>
+ hg update
  If fastq files are part of a new project/publication, make a new RS_#### directory within DCAMP's src/data directory, otherwise determine what directory .gd files should belong to.
  cd <location_of_fastq_files>
  mv -i new_gd_files/*.gd <location_from_root_of_dcamp>/dcamp/src/data/<RS_####_directory>
->
>
+ verify that the contents of new_gd_files are empty  
 ls new_gd_files
  rm -r new_gd_files
  cd <location_from_root_of_dcamp>/dcamp
-<
<
+ git add *
  git commit -m "<Brief description of what you are adding>"
  git push
->
>
+ hg add .
  hg commit -m "<Brief description of what you are adding>"
  hg push
  Version History. 
   Section contains information on versions of scripts, dates scripts were used, and archived versions. Care will be taken that new scripts preserve the same command line execution and only add new functionality. Direct suggestions for improvements to Dan.
->
>
+-30-16 GSAF Core seems to have changed the fastq file output naming system. No longer is the barcode added prior to the lane ID. Instead an unknown S# is added in front of it. S# varies per sample and does not appear to be related to the barcode. Script updated to account for new naming, but should retain old functionality as well
 -- Main.DanielDeatherage - 29 Mar 2016

View topic | History: r4 < r3 < r2 < r1 | More topic actions...