Question: Remove duplicate sequences with same id from a fasta file
1
CB • 10 wrote:
Dear all, there are many posts about remove duplicate sequences in a fasta file (https://www.biostars.org/p/3003/), but I want to remove only the duplicate sequences with the same ids.
I have many duplicate sequences in my fasta file, but with different ids and I want to keep them.
How to remove only same id sequence duplicates? I have protein sequences and my sequences are split in different lines.
ADD COMMENT
• link
•
modified 15 months ago
by
Alex Reynolds ♦ 23k
•
written
15 months ago by
CB • 10
BBMap's Dedupe utility has a "requirematchingnames" flag. This will make it only remove duplicates that have identical sequence and identical names. For example:
One copy of each duplicate set will remain, unless you add the "uniqueonly" flag.