Question: Printing for each specific sequence and its residue in msa , every msa position
0
gravatar for marin.matic
7 weeks ago by
marin.matic0 wrote:

Hi everyone so I have a question how can you print out for each specific residue of a sequence in msa its msa position I tried something with biopython :

from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
from Bio import AlignIO

alignment = AlignIO.read(open("ADORA1.pfam"), "PFAM")

def msa_position(aln, id, res_no):
    rec = next((r for r in aln if r.id==id), None)
    j = 0
    for i, res in enumerate(rec.seq):
        if res!='-':
            if j==res_no:
 return i
 j+=1

print (msa_position(alignment, "ENST00000618295|ADORA1/1-210", 3))

However, i can only do it for one position at a time and i want output like

id                                                              Sequence_position             msa_positon

ENST00000618295|ADORA1                     3                                        119

ENST00000618295|ADORA1                        4                                    120

and so on

Sorry for the code writing its my first time on biostars.

alignment biopython sequence R • 165 views
ADD COMMENTlink modified 7 weeks ago by a.zielezinski9.3k • written 7 weeks ago by marin.matic0

I have tried to format your code for you, but the indentation was not clear, so please correct me if I have it wrong.

I'm not quite sure I understand the question - you want to know, for a given sequence within an MSA, where its sequence starts (i.e. count the gap characters up til the first base/amino)?

ADD REPLYlink written 7 weeks ago by Joe18k

So what i want is a table which will map every sequence position of each sequence in given alignment (there are three of them in the msa file) to its msa position of course counting the gaps until the first amino so basically for that sequence in the example the third position in the sequence is actually 119 position cause of the gaps

ADD REPLYlink modified 7 weeks ago • written 7 weeks ago by marin.matic0

Can you provide some example input data?

ADD REPLYlink written 7 weeks ago by Joe18k

Here you go something shorter than my sequences but principle should be the same so i want each position in each sequence mapped to its msa position example in Beta the 4th position in the sequence is 5th msa position and for the other two they are the same 4th position to 4th position.

alignment = MultipleSeqAlignment([ SeqRecord(Seq("ACTGCTAGCTAG", generic_dna), id="Alpha"), SeqRecord(Seq("ACT-CTAGCTAG", generic_dna), id="Beta"), SeqRecord(Seq("ACTGCTAGDTAG", generic_dna), id="Gamma"),

ADD REPLYlink written 7 weeks ago by marin.matic0

That gap is internal to the second sequence, so should the desired output be 0 or 3?

ADD REPLYlink written 7 weeks ago by Joe18k

so in sequence itself gaps you dont count so you go 1,2,3, skip gap 4 and for each of the counted positions i need the msa position so basically it will list in a column postions in sequence from 1 to length of that sequence and to each counted residued it will write a msa position as gaps arent counted the two numbers will not be the same

ADD REPLYlink written 7 weeks ago by marin.matic0
0
gravatar for a.zielezinski
7 weeks ago by
a.zielezinski9.3k
a.zielezinski9.3k wrote:

Maybe this will help.

from Bio.Align import MultipleSeqAlignment
from Bio.Alphabet import generic_dna
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord

def map_positions(aln, id):
    rec = next((r for r in aln if r.id==id), None)
    seq_pos = 0
    for aln_pos, res in enumerate(rec.seq, start=1):
        if res!='-':
            seq_pos += 1
            yield seq_pos, aln_pos

alignment = MultipleSeqAlignment([
    SeqRecord(Seq("ACTGCTAGCTAG", generic_dna), id="Alpha"), 
    SeqRecord(Seq("ACT-CTAGCTAG", generic_dna), id="Beta"), 
    SeqRecord(Seq("ACTGCTAGDTAG", generic_dna), id="Gamma")])

print('id\tseq_pos\taln_pos')
for seq_pos, aln_pos in map_positions(alignment, "Beta"):
    print(f'Beta\t{seq_pos}\t{aln_pos}')

Output:

id  seq_pos aln_pos
Beta    1   1
Beta    2   2
Beta    3   3
Beta    4   5
Beta    5   6
Beta    6   7
Beta    7   8
Beta    8   9
Beta    9   10
Beta    10  11
Beta    11  12

If you want to show the output for all the sequences at once, you can use this code:

from Bio.Align import MultipleSeqAlignment
from Bio.Alphabet import generic_dna
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord

def map_positions(seq):
    seq_pos = 0
    for aln_pos, res in enumerate(seq, start=1):
        if res!='-':
            seq_pos += 1
            yield seq_pos, aln_pos

alignment = MultipleSeqAlignment([
    SeqRecord(Seq("ACTGCTAGCTAG", generic_dna), id="Alpha"), 
    SeqRecord(Seq("ACT-CTAGCTAG", generic_dna), id="Beta"), 
    SeqRecord(Seq("ACTGCTAGDTAG", generic_dna), id="Gamma")])

print('id\tseq_pos\taln_pos')
for aln_seq in alignment:
    for seq_pos, aln_pos in map_positions(aln_seq):
        print(f'{aln_seq.id}\t{seq_pos}\t{aln_pos}')
ADD COMMENTlink modified 7 weeks ago • written 7 weeks ago by a.zielezinski9.3k

Yes thats it thank you so much you are a lifesaver. I have just one question how can i print for all ids at once so i have alpha beta and gamma in one output.

ADD REPLYlink written 7 weeks ago by marin.matic0

Glad I could help. I just updated my answer to show all ids at once.

ADD REPLYlink written 7 weeks ago by a.zielezinski9.3k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1750 users visited in the last hour