Question: (Closed) remove repeating sequences from multifasta file.
0
gravatar for savscosta
17 months ago by
savscosta0
savscosta0 wrote:

Hello,

im working with a database that contain some fasta files with my interest genes. But some FASTA files have sequences with different IDs with the same sequence.

So, i want to remove duplicate sequences based on the nucleotide sequence for make a nonredundant database

How i can do this?

thanks

genome gene • 869 views
ADD COMMENTlink written 17 months ago by savscosta0

You are not very specific. One thing that is important to know here is how many sequences we are talking about. If it's say 500 then that probably fits in your computer's memory and we can eliminate these duplicates reasonably straight forward.

But if you are talking about millions of sequences then we need to come up with a more sophisticated strategy.

If two records with the same sequence are found, does it matter which one is deleted and which one is kept?

ADD REPLYlink written 17 months ago by WouterDeCoster43k

Hello savscosta!

Questions similar to yours can already be found at:

We have closed your question to allow us to keep similar content in the same thread.

If you disagree with this please tell us why in a reply below. We'll be happy to talk about it.

Cheers!

ADD REPLYlink modified 17 months ago • written 17 months ago by Pierre Lindenbaum126k
Please log in to add an answer.
The thread is closed. No new answers may be added.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1263 users visited in the last hour