any tool for searching duplicated or redundant sequences in a database?
2
0
Entering edit mode
3 months ago

Hello I'm building a prokaryotic protein database and I have used different sources of sequence databases, its likely the fact that on my new database more than 1 repeated sequence is present. Is there any tool for estimating sequence similarity on a single fasta file (my database)?

Thank for your time

fasta database • 267 views
ADD COMMENT
2
Entering edit mode
3 months ago
GenoMax 119k

cd-hit (LINK) or MMseq2 cluster (LINK) can both help generate non-redundant sequences. In fact NCBI is now using mmseq2 to cluster nr for their web version.

ADD COMMENT
0
Entering edit mode
7 weeks ago
Hugo ▴ 360

You can use SEDA (https://www.sing-group.org/seda/). The "Remove Redundant Sequences" operation (https://www.sing-group.org/seda/manual/operations.html#remove-redundant-sequences) allows to do this.

ADD COMMENT

Login before adding your answer.

Traffic: 1245 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6