Difference between NCBI non-redundant and refseq database
Entering edit mode
7.4 years ago
hdy ▴ 160

What is the difference between nr and refseq? Based on NCBI's own definition, "RefSeq database is a non-redundant set of reference standards derived from the INSDC databases that includes chromosomes, complete genomic molecules (organelle genomes, viruses, plasmids), intermediate assembled genomic contigs, curated genomic regions, mRNAs, RNAs, and proteins", refseq is also redundant. But when you perform blast searches, you can select either nr/nt or refseq. So I assume there is a difference.

refseq nr • 27k views
Entering edit mode
7.4 years ago

Nr database encompasses sequences from both non-curated and curated databases:

Non-curated databases (low quality):

  • GenBank/GenPept - unreviewed sequences submitted from individual laboratories and large-scale sequencing projects. Since these sequence records are owned by the original submitters and can not be altered, GenBank might contain many low quality sequences.
  • trEMBL - unreviewed section of UniProt. This section contains a computer-annotated supplement of SwissProt that contains all the translations of EMBL nucleotide sequence entries not yet integrated in SwissProt

Curated databases (high quality):

  1. RefSeq - GenBank sequences that are manually curated by the NCBI staff. RefSeq records are owned by NCBI and can be updated as needed to maintain current annotation or to incorporate additional information.
  2. SwissProt - manually annotated and reviewed protein sequences
  3. PIR - non-redundant annotated protein sequence database
  4. PDB - experimentally-determined structures of proteins, nucleic acids, and complex assemblies

Login before adding your answer.

Traffic: 1312 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6