I want to remove duplicate reads from my fasta file. I tried to use fastx_collapser. But since my reads contains lowercase letters and hyphens it failed.
How to remove the same sequences in the FASTA files?
It's like everybody wants to remove duplicates here!
Try the sequniq tool from the GenomeTools suite:
gt sequniq -o output.fasta input.fasta
i tried this command, plz could u tell how this command applied...
Try CD-hit or Uclust
You can remove unwanted hyphens and convert to uppercase using sed:
echo FaSta-TEst | sed "s/-//g ; s/(.*)/\U&/g"
Or just tr: echo FaSta-TEst | tr -d - | tr 'a-z' 'A-Z'
echo FaSta-TEst | tr -d - | tr 'a-z' 'A-Z'
Here is my free program on Github Sequence database curator
It is a very fast program and it can deal with:
It can work under Operating systems:
It also works for:
Login before adding your answer.
Use of this site constitutes acceptance of our User Agreement and Privacy