I am using UniProtKB to download protein sequences of Argonaute super-family (Query = Argonaute OR Piwi). The hits contain 194 UniProtKB/Swiss-Prot and 888 UniProtKB/TrEMBL entries.
On further analysis of these hits I find that UniProtKB/TrEMBL entries are redundant, on the other hand UniProtKB/Swiss-Prot gives one record per gene in one species.
I am in a dilemma as to which sequences/entries to consider from UniProtKB/TrEMBL for a particular protein from a specie, since there are multiple entries per gene for the same specie with different accession numbers.
For Ex. the protein Seawi from Strongylocentrotus purpuratus has only one gene but UniProtKB/TrEMBL lists 4 accessions (Q9GPA7, Q9GPA8, Q9GPA6, C9EID6) with varying sequence length.
There are large number of sequences which I will be missing out if I use only UniProtKB/Swiss-Prot sequences.
Kindly help me on this...