Question: non redundant protein sequence database
0
gravatar for Gina
16 months ago by
Gina10
Gina10 wrote:

Hi,

Having a hard time understanding what non redundant protein sequence database means. Can anyone help?

non redundant • 1.5k views
ADD COMMENTlink modified 16 months ago by h.mon29k • written 16 months ago by Gina10
2
gravatar for genomax
16 months ago by
genomax78k
United States
genomax78k wrote:

Assuming you are asking about nr blast database.

nr.*tar.gz  | Non-redundant protein sequences from GenPept, Swissprot, PIR, PDF, PDB, and NCBI RefSeq

Non-redundant defline syntax

The non-redundant databases are nr, nt and pataa. Identical sequences are 
merged into one entry in these databases. To be merged two sequences must
have identical lengths and every residue at every position must be the 
same.  The FASTA deflines for the different entries that belong to one 
record are separated by control-A characters invisible to most 
programs. In the example below both entries Q57293.1 and AAB05030.1
have the same sequence, in every respect:

>Q57293.1 RecName: Full=Fe(3+) ions import ATP-binding protein FbpC ^AAAB05030.1 afuC 
[Actinobacillus pleuropneumoniae] ^AAAB17216.1 afuC [Actinobacillus pleuropneumoniae]
MNNDFLVLKNITKSFGKATVIDNLDLVIKRGTMVTLLGPSGCGKTTVLRLVAGLENPTSGQIFIDGEDVTKSSIQNRDIC
IVFQSYALFPHMSIGDNVGYGLRMQGVSNEERKQRVKEALELVDLAGFADRFVDQISGGQQQRVALARALVLKPKVLILD
EPLSNLDANLRRSMREKIRELQQRLGITSLYVTHDQTEAFAVSDEVIVMNKGTIMQKARQKIFIYDRILYSLRNFMGEST
ICDGNLNQGTVSIGDYRFPLHNAADFSVADGACLVGVRPEAIRLTATGETSQRCQIKSAVYMGNHWEIVANWNGKDVLIN
ANPDQFDPDATKAFIHFTEQGIFLLNKE
ADD COMMENTlink modified 16 months ago • written 16 months ago by genomax78k
2
gravatar for h.mon
16 months ago by
h.mon29k
Brazil
h.mon29k wrote:

Non-redundant means redundant information has been pruned out from the database. However, there are different definitions of redundancy, and different methods of removing redundancy - for example, RefSeq non-redundant proteins considers redundant proteins as identical proteins, and it keeps only one record for a given protein, no mater the strain or species of origin. Other databases may have different definitions, though.

About which non-redundant database are you talking?

ADD COMMENTlink written 16 months ago by h.mon29k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1510 users visited in the last hour