Question: Ncbi Wgs/Nt/Env-Nt Databases
1
gravatar for Lythimus
8.3 years ago by
Lythimus200
Lythimus200 wrote:

I am currently BLASTning against NCBI's NT database but I am considering also using WGS and ENV-NT. I was given the impression that WGS was populated by pulling from ENV-NT if the sequence was definitively classified to a specfific organism but after looking at the file sizes it seems the reverse. Could someone explain to me clearly the differences in NT, ENV-NT and WGS and maybe give an example of when I would and possibly wouldn't want to use specific databases or sets of databases?

Just assume whatever domain you are most familiar with and use those in your examples please.

ncbi nucleotide database blast • 7.3k views
ADD COMMENTlink modified 4.0 years ago by Biostar ♦♦ 20 • written 8.3 years ago by Lythimus200
7
gravatar for Alex
8.3 years ago by
Alex1.4k
Theodosius Dobzhansky Center for Genome Bioinformatics
Alex1.4k wrote:

Useful page with NCBI databases description:

http://www.ncbi.nlm.nih.gov/BLAST/blastcgihelp.shtml#nucleotide_databases

nt contains all GenBank + RefSeq Nucleotides + EMBL + DDBJ + PDB sequences (excluding HTGS0,1,2, EST, GSS, STS, PAT, WGS). No longer "non-redundant".

wgs is collection of partially assembled sequences from the genome centers. These are contigs assembled directly from whole genome shotgun sequencing.

env_nt contains DNA sequenced directly from the environment (from all organisms mixed together, e.g. Sargasso Sea and Mine Drainage projects)

Additionaly check:

est contains sequence data and other information on "single-pass" cDNA sequences, or "Expressed Sequence Tags". More here http://www.ncbi.nlm.nih.gov/pubmed/8401577?dopt=Abstract

htgs - unfinished High Throughput Genomic Sequences: phases 0, 1 and 2 (finished, phase 3 HTG sequences are in nr). About phases here: http://www.ncbi.nlm.nih.gov/HTGS/ .

gss - Genome Survey Sequence, includes single-pass genomic data, exon-trapped sequences, and Alu PCR sequences. More here: http://www.ncbi.nlm.nih.gov/dbGSS/index.html

sts - contains sequence data for short genomic landmark sequences or Sequence Tagged Sites. More here http://www.ncbi.nlm.nih.gov/pubmed/2781285?dopt=Abstract.

ADD COMMENTlink written 8.3 years ago by Alex1.4k
1

All WGS sequences - yes (genome project), ENV_NT sequeneces - usually not.

ADD REPLYlink written 8.3 years ago by Alex1.4k

This may be a silly question. Are sequences from WGS and ENV_NT taxonomically classified (as in NT) and aren't solely associated with the environment from which they were harvested?

ADD REPLYlink written 8.3 years ago by Lythimus200
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1371 users visited in the last hour