How to remove gap-containing sequences from a fasta file?
1
0
Entering edit mode
3.6 years ago

I trimmed poor aligned regions from a MSA file, using trimAl. Now, I would like to remove all gap-containing sequences, and keep only those sequences with no gaps. It would be something like this:

input:

>seq_1
LSAIFQQPLAALLSN--------QQ-----------------------------------

>seq_2
-----------------------------GPLLTGALVTEDVAASALRIMIVALKVIIDA
ASVSELCATLLVELSVAGIVNVMNCAL 

>seq_3
IKVLSEQALGQHLTQIQNCLWTLNLSAATGQILVTQLGDDNMATGILSNLVTQVEALIHV
LDVEPAVCALLTPVGLALLREALINAL

Output:

>seq_3
IKVLSEQALGQHLTQIQNCLWTLNLSAATGQILVTQLGDDNMATGILSNLVTQVEALIHV
LDVEPAVCALLTPVGLALLREALINAL

Does anyone know how to do it? I will appreciate any help. Thanks!

alignment • 1.5k views
ADD COMMENT
0
Entering edit mode

Are the sequence name and sequence all contained within one line in the file? Or are there line breaks like in your post?

EDIT: It appears to be a fasta file after the post was edited, and if so the answer was kindly provided by @genomax.

ADD REPLY
2
Entering edit mode
3.6 years ago
GenoMax 141k
Linearize the fasta file | remove lines containing gaps `grep -v "-"` | reformat back to fasta.

Code for fasta manipulation (courtesy of @yokofakun):

ADD COMMENT
0
Entering edit mode

It works perfectly. Thanks!

ADD REPLY
0
Entering edit mode

You can accept the answer to provide closure to this thread (green check mark).

ADD REPLY

Login before adding your answer.

Traffic: 1953 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6