Question: Difference between NCBI non-redundant and refseq database
3.2 years ago by
United States
hdy100 wrote:

What is the difference between nr and refseq? Based on NCBI's own definition, "RefSeq database is a non-redundant set of reference standards derived from the INSDC databases that includes chromosomes, complete genomic molecules (organelle genomes, viruses, plasmids), intermediate assembled genomic contigs, curated genomic regions, mRNAs, RNAs, and proteins. ", refseq is also redundant. But when you perform blast searches, you can select either nr/nt or refseq. So I assume there is a difference.

refseq nr
written 3.2 years ago by hdy100
3.2 years ago by
a.zielezinski8.5k wrote:

Nr database encompasses sequences from both, non-curated and curated databases:

Non-curated databases (low quality):

  • GenBank/GenPept - unreviewed sequences submitted from individual laboratories and large-scale sequencing projects. Since these sequence records are owned by the original submitters and can not be altered, GenBank might contain many low quality sequences.
  • trEMBL - unreviewed section of UniProt. This section contains a computer-annotated supplement of SwissProt that contains all the translations of EMBL nucleotide sequence entries not yet integrated in SwissProt

Curated databases (high quality):

  1. RefSeq - GenBank sequences that are manually curated by the NCBI staff. RefSeq records are owned by NCBI and can be updated as needed to maintain current annotation or to incorporate additional information.
  2. SwissProt - manually annotated and reviewed protein sequences
  3. PIR -  non-redundant annotated protein sequence database
  4. PDB - experimentally-determined structures of proteins, nucleic acids, and complex assemblies
written 3.2 years ago by a.zielezinski8.5k
