Hi. I wish to remove any sequence that is partial of a longer sequence in multifasta file. For example, let say I have three sequences below:
>seq1 ACGACGATCGT**ACTAGCATCGAGCGTAC**TACGTAGCGCGT >seq2 **ACTAGCATCGAGCGTAC** >seq3 AGCAGCGTACGTGACTACGACGATCTACGTATCTAGCTCGTACACT
seq2 is exactly part of
seq1. So after removing the partial (duplicate) sequences, I am expecting to have the following multifasta file:
>seq1 ACGACGATCGTACTAGCATCGAGCGTACTACGTAGCGCGT >seq3 AGCAGCGTACGTGACTACGACGATCTACGTATCTAGCTCGTACACT
All the answers I managed to search are removal of exact duplicates. Is there any tool or script to achieve the purpose? Thanks in advance.