Infer amino acid alignment from pairwise nucleotide alignment
0
0
Entering edit mode
6 months ago

Hola peeps :)

I develop a Python package which has a module that aligns gene nucleotide sequences to contigs with Minimap2.

Given the query (gene) and target (contig) nucleotide sequences, I can reconstruct the aligned sequences using the CIGAR. I can also extract the resulting gene sequence from the contig, and translate to get the resulting amino acid sequence.

However, I would like to calculate the number of amino acid matches between the query protein sequence and protein sequence extracted from the contig to get a protein percent identity value for reporting.

I recognise that this can be achieved with a pairwise protein aligner, however I want to avoid including an additional dependency just for this one function.

I have tried iterating through the aligned sequences in chunks of 3 (codons) and comparing the codon translations, however this does not handle insertions and deletions in nucleotide space.

So my question is: is it possible to infer the pairwise amino acid alignment from a pairwise nucleotide alignment (with CIGAR) and the corresponding amino acid sequences, and how could I implement this in Python?

An example problematic CIGAR is:

'71M4I14M4D4M1I18M2D3M2I2M1D14M1D3M1I130M2I15M1I4M1D1M2D4M2D2M1I6M4D8M2I3M3I6M3I7M1D4M2D31M1D1M2I3M1D31M1I7M1D52M2I1M1D3M1D41M1I8M1D6M6I101M2D4M2I45M1I6M1D116M1D8M1I562M'

(the sequences are quite long so I will paste on request)

Thanks in advance!

alignment • 434 views
ADD COMMENT
1
Entering edit mode

Honestly, given that the first 71M could introduce any number of mutations, I doubt there's any easy/reliable way to infer amino acid alignment. For example, what if in the first 71M, you create an amino acid sequence that matches the last 22 amino acids, but then ends in a stop codon. I don't see any way you would know that the amino acid sequence matches the last half. Plus the frameshift mutations on top of that.

It seems it would be easier to just implement your own pairwise algorithm if you really don't want a dependency.

ADD REPLY

Login before adding your answer.

Traffic: 3861 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6