any tool for searching duplicated or redundant sequences in a database?
2
0
Entering edit mode
23 months ago

Hello I'm building a prokaryotic protein database and I have used different sources of sequence databases, its likely the fact that on my new database more than 1 repeated sequence is present. Is there any tool for estimating sequence similarity on a single fasta file (my database)?

Thank for your time

fasta database • 702 views
ADD COMMENT
2
Entering edit mode
23 months ago
GenoMax 141k

cd-hit (LINK) or MMseq2 cluster (LINK) can both help generate non-redundant sequences. In fact NCBI is now using mmseq2 to cluster nr for their web version.

ADD COMMENT
0
Entering edit mode
22 months ago
Hugo ▴ 380

You can use SEDA (https://www.sing-group.org/seda/). The "Remove Redundant Sequences" operation (https://www.sing-group.org/seda/manual/operations.html#remove-redundant-sequences) allows to do this.

ADD COMMENT

Login before adding your answer.

Traffic: 1700 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6