Get consensus from a MSA fasta file with IUPAC ambiguities in python

0

Entering edit mode

2.2 years ago

statamn • 0

Hello,

I have an almost similar question to the topic Get consensus from Muscle alignment with IUPAC ambiguities in python

I have a fasta file with align sequence and I want to generate a consensus.

So far I wrote :

import sys
from Bio import AlignIO
from Bio.Align import AlignInfo

alignment = AlignIO.read("input.fasta", 'fasta')
summary_align = AlignInfo.SummaryInfo(alignment)
summary_align.dumb_consensus(0.3)

The "0.3" is the threshold of allele frequency to be considered in the consensus. It means that for each column, if a base is represented at least in 30% of the aligment, it will be taken into account; and if more than one base fit this criteria, the corresponding IUPAC ambiguity code is used.

But the "dumb_consensus" only generate the highest represented base and de facto don't use the IUPAC code.

So do you have a way to do such a consensus using Biopython ( or Python in general ) ?

Thanks

python alignment IUPAC biopython consensus • 462 views

ADD COMMENT • link 2.2 years ago by statamn • 0

Login before adding your answer.