Question: How To Calculate Number Of Gap Windows And Their Length In A Multiple Alignment
1
gravatar for Biojl
6.7 years ago by
Biojl1.6k
Barcelona
Biojl1.6k wrote:

Hi,

Anyone knows if there is an implemented function in biopython to calculate gap windows and their length in a multiple sequence alignment. I found a module (http://biopython.org/DIST/docs/api/Bio.Align.AlignInfo-pysrc.html) to calculate the consensus sequence but it do not allow to set that 1 gap in any sequence should give an output of a gap in the consensus. My input is a Clustalw alignment.

Seq1 AAATCG--
Seq2 ---TCGAA
Seq3 -AATCGAA
Seq4 -AATCGAA
        ***

consensus would be: ---TCG--

desired output: gap windows = 2

window1->length 3

window2->length 2

Any ideas?

python alignment biopython • 2.4k views
ADD COMMENTlink modified 4.9 years ago by Biostar ♦♦ 20 • written 6.7 years ago by Biojl1.6k

would it not be easier to remove the "-" afterwards? Unless I am missing what you are trying to accomplish?

ADD REPLYlink written 6.7 years ago by Whetting1.5k

I'm just interested in getting the number of gap windows and their length. Since the clustal format only gives you * or : or blank as output (consensus) I thought it was first needed to create this consensus showing the gaps, then count them.

ADD REPLYlink written 6.7 years ago by Biojl1.6k
3
gravatar for Biojl
6.7 years ago by
Biojl1.6k
Barcelona
Biojl1.6k wrote:

This is the solution I came up with. It uses the SeqIO module from Bio (Biopython).

from Bio import SeqIO
handle = open('file_name', "rU") #File name corresponds to a multiple alignment in clustal format
records = list(SeqIO.parse(handle, "clustal"))

def calc_gaps(records):
    j=0
    gaps = 0
    gap = 0
    window_len = 0
    windows = []
    while j<len(records[0].seq): #j iterates through the positions of each sequence
        i=0
        while i< len(records):
            if records[i].seq[j] == '-':
                gap = 1
                break
            i+=1
        if gap==1:
            window_len +=1
            gaps=gaps+1
        else:
            if window_len!=0:
                windows.append(window_len)
                window_len=0
        j+=1
        gap=0
        if j==len(records[0].seq) and window_len!=0:
            windows.append(window_len)
    return gaps, len(windows), (float( sum(windows) ) / len(windows))

(gaps,windows,mean)=calc_gaps(records) #Here we call the function and get number of gaps, number of windows and mean number of gaps per window

I hope this will be useful for someone else.

ADD COMMENTlink written 6.7 years ago by Biojl1.6k

+1 for posting your solution. I was wanting to post some code using R, but got caught up in work. Glad to see you found a solution.

ADD REPLYlink written 6.7 years ago by Leonor Palmeira3.7k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1512 users visited in the last hour