Question

Biopython: Getting Positions In A Multiple Sequence Alignment From A Reference Sequence In The Alignment

3

Entering edit mode

10.2 years ago

weslfield ▴ 90

Ok, so I am writing an application that examines mutations at user defined positions within an MSA. Using BioPython to do this. It is fairly simple to examine columns in the alignment but their indexes are relative to the alignment and not a particular sequence (i.e. includes gap). I would like to be able to reference a sequence in the MSA by its ID and translate a sequence-specific position to a MSA-specific position. Bioperl allows you to do this like so —

$pos = $alignment -> column_from_residue_number('gi|388479282|ref|YP_491474.1|', 87);

This would assign some integer to $pos representing the column number in the gapped alignment of the ungapped sequence denoted by gi|388479282|ref|YP_491474.1|

Does anyone know a way to do this in BioPython given and alignment object? Thanks!

biopython bioperl position multiple-alignment • 8.5k views

ADD COMMENT • link updated 10.2 years ago by Andrzej Zielezinski 11k • written 10.2 years ago by weslfield ▴ 90

0

Entering edit mode

I didn't find a straightforward way to do it. What I do is to count the gaps in the reference sequence until the desired position and add this number to the position number.

ADD REPLY • link 10.2 years ago by Asaf 10k

score 4 · Answer 1 · 2014-02-23

Maybe something like this:

from Bio.Alphabet import generic_dna
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
from Bio.Align import MultipleSeqAlignment

def column_from_residue_number(aln, id, res_no):
    rec = next((r for r in aln if r.id == id), None)
    j = 0
    for i, res in enumerate(rec.seq):
        if res!='-':
            if j==res_no:
                return i
            j+=1

alignment = MultipleSeqAlignment([
   SeqRecord(Seq("ACTGCTAGCTAG", generic_dna), id="Alpha"),
   SeqRecord(Seq("ACT-CTAGCTAG", generic_dna), id="Beta"),
   SeqRecord(Seq("ACTGCTAGDTAG", generic_dna), id="Gamma"),
])

print column_from_residue_number(alignment, "Beta", 3)

You will get: