Question: (Closed) remove repeating sequences from multifasta file.
gravatar for savscosta
7 months ago by
savscosta0 wrote:


im working with a database that contain some fasta files with my interest genes. But some FASTA files have sequences with different IDs with the same sequence.

So, i want to remove duplicate sequences based on the nucleotide sequence for make a nonredundant database

How i can do this?


genome gene • 384 views
ADD COMMENTlink written 7 months ago by savscosta0

You are not very specific. One thing that is important to know here is how many sequences we are talking about. If it's say 500 then that probably fits in your computer's memory and we can eliminate these duplicates reasonably straight forward.

But if you are talking about millions of sequences then we need to come up with a more sophisticated strategy.

If two records with the same sequence are found, does it matter which one is deleted and which one is kept?

ADD REPLYlink written 7 months ago by WouterDeCoster38k

Hello savscosta!

Questions similar to yours can already be found at:

We have closed your question to allow us to keep similar content in the same thread.

If you disagree with this please tell us why in a reply below. We'll be happy to talk about it.


ADD REPLYlink modified 7 months ago • written 7 months ago by Pierre Lindenbaum119k
Please log in to add an answer.
The thread is closed. No new answers may be added.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1651 users visited in the last hour