Question: How To Calculate Number Of Gap Windows And Their Length In A Multiple Alignment
1
6.7 years ago by
Biojl1.6k
Barcelona
Biojl1.6k wrote:

Hi,

Anyone knows if there is an implemented function in biopython to calculate gap windows and their length in a multiple sequence alignment. I found a module (http://biopython.org/DIST/docs/api/Bio.Align.AlignInfo-pysrc.html) to calculate the consensus sequence but it do not allow to set that 1 gap in any sequence should give an output of a gap in the consensus. My input is a Clustalw alignment.

``````Seq1 AAATCG--
Seq2 ---TCGAA
Seq3 -AATCGAA
Seq4 -AATCGAA
***
``````

consensus would be: ---TCG--

desired output: gap windows = 2

window1->length 3

window2->length 2

Any ideas?

python alignment biopython • 2.4k views
modified 4.9 years ago by Biostar ♦♦ 20 • written 6.7 years ago by Biojl1.6k

would it not be easier to remove the "-" afterwards? Unless I am missing what you are trying to accomplish?

I'm just interested in getting the number of gap windows and their length. Since the clustal format only gives you * or : or blank as output (consensus) I thought it was first needed to create this consensus showing the gaps, then count them.

3
6.7 years ago by
Biojl1.6k
Barcelona
Biojl1.6k wrote:

This is the solution I came up with. It uses the SeqIO module from Bio (Biopython).

``````from Bio import SeqIO
handle = open('file_name', "rU") #File name corresponds to a multiple alignment in clustal format
records = list(SeqIO.parse(handle, "clustal"))

def calc_gaps(records):
j=0
gaps = 0
gap = 0
window_len = 0
windows = []
while j<len(records[0].seq): #j iterates through the positions of each sequence
i=0
while i< len(records):
if records[i].seq[j] == '-':
gap = 1
break
i+=1
if gap==1:
window_len +=1
gaps=gaps+1
else:
if window_len!=0:
windows.append(window_len)
window_len=0
j+=1
gap=0
if j==len(records[0].seq) and window_len!=0:
windows.append(window_len)
return gaps, len(windows), (float( sum(windows) ) / len(windows))

(gaps,windows,mean)=calc_gaps(records) #Here we call the function and get number of gaps, number of windows and mean number of gaps per window
``````

I hope this will be useful for someone else.