Question

non redundant protein sequence database

0

Entering edit mode

5.5 years ago

Gina ▴ 10

Hi,

Having a hard time understanding what non redundant protein sequence database means. Can anyone help?

non redundant • 7.4k views

ADD COMMENT • link updated 5.5 years ago by h.mon 35k • written 5.5 years ago by Gina ▴ 10

score 2 · Answer 1 · 2018-10-10

Assuming you are asking about nr blast database.

nr.*tar.gz  | Non-redundant protein sequences from GenPept, Swissprot, PIR, PDF, PDB, and NCBI RefSeq

Non-redundant defline syntax

The non-redundant databases are nr, nt and pataa. Identical sequences are 
merged into one entry in these databases. To be merged two sequences must
have identical lengths and every residue at every position must be the 
same.  The FASTA deflines for the different entries that belong to one 
record are separated by control-A characters invisible to most 
programs. In the example below both entries Q57293.1 and AAB05030.1
have the same sequence, in every respect:

>Q57293.1 RecName: Full=Fe(3+) ions import ATP-binding protein FbpC ^AAAB05030.1 afuC 
[Actinobacillus pleuropneumoniae] ^AAAB17216.1 afuC [Actinobacillus pleuropneumoniae]
MNNDFLVLKNITKSFGKATVIDNLDLVIKRGTMVTLLGPSGCGKTTVLRLVAGLENPTSGQIFIDGEDVTKSSIQNRDIC
IVFQSYALFPHMSIGDNVGYGLRMQGVSNEERKQRVKEALELVDLAGFADRFVDQISGGQQQRVALARALVLKPKVLILD
EPLSNLDANLRRSMREKIRELQQRLGITSLYVTHDQTEAFAVSDEVIVMNKGTIMQKARQKIFIYDRILYSLRNFMGEST
ICDGNLNQGTVSIGDYRFPLHNAADFSVADGACLVGVRPEAIRLTATGETSQRCQIKSAVYMGNHWEIVANWNGKDVLIN
ANPDQFDPDATKAFIHFTEQGIFLLNKE

score 2 · Answer 2 · 2018-10-10

Non-redundant means redundant information has been pruned out from the database. However, there are different definitions of redundancy, and different methods of removing redundancy - for example, RefSeq non-redundant proteins considers redundant proteins as identical proteins, and it keeps only one record for a given protein, no mater the strain or species of origin. Other databases may have different definitions, though.

About which non-redundant database are you talking?