How to remove gaps from sequence alignment?
1
0
Entering edit mode
3.7 years ago

I want to remove gaps from sequence alignment. How should do that is there any tools or software to do it or is it better to write a program and then proceed?

alignment sequence global alignment gaps • 3.6k views
ADD COMMENT
1
Entering edit mode

https://mothur.org/wiki/degap.seqs/

in python:

from Bio import SeqIO
import sys
from Bio import AlignIO

input_file = sys.argv[1]
output_file = sys.argv[2]

with open(output_file, "w") as o:
    for record in AlignIO.read(input_file, "fasta"):
        record.seq = record.seq.ungap("-")
        SeqIO.write(record, o, "fasta")

save it as python script. Run the script with input and output file names.

ADD REPLY
0
Entering edit mode

Sorry, but your question is unclear. What do you mean by "remove gaps"? why would you want to do that? gaps are part of the alignment result...

ADD REPLY
0
Entering edit mode

If you literally remove all the gaps, it will no longer be an alignment. It will just be a multi-fasta file. Is that what you want? Or are you looking to remove positions/columns in the alignment file that include gaps? This would preserve the structure of alignment, at the cost of removing some information.

ADD REPLY
0
Entering edit mode
3.7 years ago
h.mon 35k

If you want to remove sites with lots of gaps (thus inferred to be non-homologous), you can use gblocks or trimal.

ADD COMMENT

Login before adding your answer.

Traffic: 2025 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6