Biopython: Getting Positions In A Multiple Sequence Alignment From A Reference Sequence In The Alignment
1
3
Entering edit mode
7.1 years ago
weslfield ▴ 90

Ok, so I am writing an application that examines mutations at user defined positions within an MSA. Using BioPython to do this. It is fairly simple to examine columns in the alignment but their indexes are relative to the alignment and not a particular sequence (i.e. includes gap). I would like to be able to reference a sequence in the MSA by its ID and translate a sequence-specific position to a MSA-specific position. Bioperl allows you to do this like so —

$pos =$alignment -> column_from_residue_number('gi|388479282|ref|YP_491474.1|', 87);

This would assign some integer to \$pos representing the column number in the gapped alignment of the ungapped sequence denoted by gi|388479282|ref|YP_491474.1|

Does anyone know a way to do this in BioPython given and alignment object? Thanks!

biopython bioperl position multiple-alignment • 6.5k views
0
Entering edit mode

I didn't find a straightforward way to do it. What I do is to count the gaps in the reference sequence until the desired position and add this number to the position number.

4
Entering edit mode
7.1 years ago

Maybe something like this:

from Bio.Alphabet import generic_dna
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
from Bio.Align import MultipleSeqAlignment

def column_from_residue_number(aln, id, res_no):
rec = next((r for r in aln if r.id == id), None)
j = 0
for i, res in enumerate(rec.seq):
if res!='-':
if j==res_no:
return i
j+=1

alignment = MultipleSeqAlignment([
SeqRecord(Seq("ACTGCTAGCTAG", generic_dna), id="Alpha"),
SeqRecord(Seq("ACT-CTAGCTAG", generic_dna), id="Beta"),
SeqRecord(Seq("ACTGCTAGDTAG", generic_dna), id="Gamma"),
])

print column_from_residue_number(alignment, "Beta", 3)


You will get:

4