Publicly Archiving Data
These locations can give you accession numbers for data that may not be easily communicated as supplementary information for a research report. An advantage of submitting to these public databases is that your data will be archived in standard formats that others can use more easily.
Submitting Sequences to GenBank
The easiest way for a few sequences is to use the
BankIt web submission tool.
The Geneious submission tool does not properly format GenBank submissions as of v6.06 (
@JEB).
Submitting Sequencing Reads to the SRA
SRA main page
SRA login page
NCBI online SRA manual
Examples of completed metadata spreadsheets:
Notes:
- You cannot change aliases in the normal upload area, so be careful to enter them correctly the first time!
- It is easiest to upload uncompressed FASTQ files.
- Illumina uploads must be in Illumina 1.5+ FASTQ format, not converted to Sanger FASTQ format.
- The flow cell number and lane are encoded in the name of every read in the FASTQ.
- Paired-end or mate-paired FASTQ files must be interleaved (one file alternating corresponding first and second reads), rather than with all of the first reads in one file and all of the second reads in another file. The script interleave_paired_fastq.pl can construct the interleaved file.
- The script estimate_insert_length.sh can be used to estimate the fragment size in a paired library to complete those fields.
- Use the
md5sum
command to calculate the MD5 checksum for FASTQ files.
Submitting Transcriptomics Data (Differential Gene Expression)
NCBI online GEO manual and submission link
- Do not create an SRA entry for the FASTQ files. This is handled within GEO!
Dryad
Dryad is especially good for submitting large data tables and analysis scripts (e.g., in R).
Topic revision: r6 - 2020-07-03 - 01:56:28 - Main.JeffreyBarrick
Lab.PubliclyArchivingData moved from Lab.ProtocolsUploadingDataToSRA on 2013-02-14 - 14:28 by Main.JeffreyBarrick -