Extract Sequences from Multiple sequence alignment containing special character like gap?
1
0
Entering edit mode
4.5 years ago
MSRS ▴ 590

I have multiple sequence alignment files of DNA in FASTA which contain deletions and insertions. I want to extract sequences that only contain a gap (-). Are there any available programs to run it?

My input like this

>A
ATGCATGCATGCAGCATGC
>B
GCATGCAT---GCATACATGC
>C
ATGC-ATGCATGCATGCA---

Output should be

>B
GCATGCAT---GCATACATGC
>C
ATGC-ATGCATGCATGCA---

Thanks In Advance.

alignment sequence • 1.3k views
ADD COMMENT
2
Entering edit mode
4.5 years ago

Try seqkit grep

seqkit grep -s -p  "-" in.fa > out.fa
ADD COMMENT
0
Entering edit mode

Thank you very much, Shenwei.

ADD REPLY

Login before adding your answer.

Traffic: 820 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6