Hi,
Having a hard time understanding what non redundant protein sequence database means. Can anyone help?
Hi,
Having a hard time understanding what non redundant protein sequence database means. Can anyone help?
Assuming you are asking about nr
blast database.
nr.*tar.gz | Non-redundant protein sequences from GenPept, Swissprot, PIR, PDF, PDB, and NCBI RefSeq
The non-redundant databases are nr, nt and pataa. Identical sequences are
merged into one entry in these databases. To be merged two sequences must
have identical lengths and every residue at every position must be the
same. The FASTA deflines for the different entries that belong to one
record are separated by control-A characters invisible to most
programs. In the example below both entries Q57293.1 and AAB05030.1
have the same sequence, in every respect:
>Q57293.1 RecName: Full=Fe(3+) ions import ATP-binding protein FbpC ^AAAB05030.1 afuC
[Actinobacillus pleuropneumoniae] ^AAAB17216.1 afuC [Actinobacillus pleuropneumoniae]
MNNDFLVLKNITKSFGKATVIDNLDLVIKRGTMVTLLGPSGCGKTTVLRLVAGLENPTSGQIFIDGEDVTKSSIQNRDIC
IVFQSYALFPHMSIGDNVGYGLRMQGVSNEERKQRVKEALELVDLAGFADRFVDQISGGQQQRVALARALVLKPKVLILD
EPLSNLDANLRRSMREKIRELQQRLGITSLYVTHDQTEAFAVSDEVIVMNKGTIMQKARQKIFIYDRILYSLRNFMGEST
ICDGNLNQGTVSIGDYRFPLHNAADFSVADGACLVGVRPEAIRLTATGETSQRCQIKSAVYMGNHWEIVANWNGKDVLIN
ANPDQFDPDATKAFIHFTEQGIFLLNKE
Non-redundant means redundant information has been pruned out from the database. However, there are different definitions of redundancy, and different methods of removing redundancy - for example, RefSeq non-redundant proteins considers redundant proteins as identical proteins, and it keeps only one record for a given protein, no mater the strain or species of origin. Other databases may have different definitions, though.
About which non-redundant database are you talking?
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.