any tool for searching duplicated or redundant sequences in a database?
2
0
Entering edit mode
8 weeks ago

Hello I'm building a prokaryotic protein database and I have used different sources of sequence databases, its likely the fact that on my new database more than 1 repeated sequence is present. Is there any tool for estimating sequence similarity on a single fasta file (my database)?

Thank for your time

fasta database • 232 views
ADD COMMENT
2
Entering edit mode
8 weeks ago
GenoMax 117k

cd-hit (LINK) or MMseq2 cluster (LINK) can both help generate non-redundant sequences. In fact NCBI is now using mmseq2 to cluster nr for their web version.

ADD COMMENT
0
Entering edit mode
11 days ago
Hugo ▴ 360

You can use SEDA (https://www.sing-group.org/seda/). The "Remove Redundant Sequences" operation (https://www.sing-group.org/seda/manual/operations.html#remove-redundant-sequences) allows to do this.

ADD COMMENT

Login before adding your answer.

Traffic: 745 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6