I have pairs of coding DNA sequences which I wish to perform pairwise codon alignments via Python, I have "half completed" the process.
I retrive pairs of orthologous DNA sequences from genbank using Biopython package.
I translate the orthologous pairs into peptide sequences and then align them using EMBOSS Needle program.
I wish to..
Transfer the gaps from the peptide sequences into the original DNA sequences.
I would appreciate suggestions for programs/code (called from Python) that can transfer gaps from aligned peptide sequence pairs onto codons of the corresponding nucleotide sequence pairs. Or programs/code that can carry out the pairwise codon alignment from scratch.
In the end I made my own Python function that transfers gaps ('-') from the peptide sequence to the nucleotide sequence (codons).
It takes an aligned peptide sequence with gaps and the corresponding un-aligned nucleotide sequence and gives an aligned nucleotide sequence:
def gapsFromPeptide( peptide_seq, nucleotide_seq ):
""" Transfers gaps from aligned peptide seq into codon partitioned nucleotide seq (codon alignment)
- peptide_seq is an aligned peptide sequence with gaps that need to be transferred to nucleotide seq
- nucleotide_seq is an un-aligned dna sequence whose codons translate to peptide seq"""
def chunks(l, n):
""" Yield successive n-sized chunks from l."""
for i in xrange(0, len(l), n):
codons = [codon for codon in chunks(nucleotide_seq,3)] #splits nucleotides into codons (triplets)
gappedCodons = 
codonCount = 0
for aa in peptide_seq: #adds '---' gaps to nucleotide seq corresponding to peptide
codonCount += 1
check for some insights in coding for codon alignment http://zruanweb.com/
Don't reinvent the wheel unless you simply like creating wheels:) Use: http://translatorx.co.uk/ If you still want to do this in Python, paste some of your code, we will make this work.
This sounds like a simple loop iterating each sequence of the proteins, writing three letters from the DNA if not a gap and three gaps if it is. This is assuming you're in-frame.
cheers, in the end I did something like that eventually and posted the answer to my own question
Are your DNA sequences already in the correct frame?