I have pairs of coding DNA sequences which I wish to perform pairwise codon alignments via Python, I have "half completed" the process.
So far..
I retrive pairs of orthologous DNA sequences from genbank using Biopython package.
I translate the orthologous pairs into peptide sequences and then align them using EMBOSS Needle program.
I wish to..
Transfer the gaps from the peptide sequences into the original DNA sequences.
Question
I would appreciate suggestions for programs/code (called from Python) that can transfer gaps from aligned peptide sequence pairs onto codons of the corresponding nucleotide sequence pairs. Or programs/code that can carry out the pairwise codon alignment from scratch.
Don't reinvent the wheel unless you simply like creating wheels:) Use: http://translatorx.co.uk/ If you still want to do this in Python, paste some of your code, we will make this work.
This sounds like a simple loop iterating each sequence of the proteins, writing three letters from the DNA if not a gap and three gaps if it is. This is assuming you're in-frame.
In the end I made my own Python function that transfers gaps ('-') from the peptide sequence to the nucleotide sequence (codons).
It takes an aligned peptide sequence with gaps and the corresponding un-aligned nucleotide sequence and gives an aligned nucleotide sequence:
Function
def gapsFromPeptide( peptide_seq, nucleotide_seq ):
""" Transfers gaps from aligned peptide seq into codon partitioned nucleotide seq (codon alignment)
- peptide_seq is an aligned peptide sequence with gaps that need to be transferred to nucleotide seq
- nucleotide_seq is an un-aligned dna sequence whose codons translate to peptide seq"""
def chunks(l, n):
""" Yield successive n-sized chunks from l."""
for i in xrange(0, len(l), n):
yield l[i:i+n]
codons = [codon for codon in chunks(nucleotide_seq,3)] #splits nucleotides into codons (triplets)
gappedCodons = []
codonCount = 0
for aa in peptide_seq: #adds '---' gaps to nucleotide seq corresponding to peptide
if aa!='-':
gappedCodons.append(codons[codonCount])
codonCount += 1
else:
gappedCodons.append('---')
return(''.join(gappedCodons))
check for some insights in coding for codon alignment http://zruanweb.com/
Don't reinvent the wheel unless you simply like creating wheels:) Use: http://translatorx.co.uk/ If you still want to do this in Python, paste some of your code, we will make this work.
This sounds like a simple loop iterating each sequence of the proteins, writing three letters from the DNA if not a gap and three gaps if it is. This is assuming you're in-frame.
cheers, in the end I did something like that eventually and posted the answer to my own question
Are your DNA sequences already in the correct frame?