Question

Nucleotide to amino acid conversion of multiple sequences

0

Entering edit mode

4.7 years ago

lokraj2003 ▴ 120

I have 220 nucleotide sequences. They are around 7000 bp and each of them should translate into one protein of around 2100 amino acids. There are some functions already available in R that can convert nucleotide sequence to amino acid sequence. But, these available functions like Translate in seqinr or trans from ape package require start and end position. But my sequences don't have same start position. For exampe one sequence has start position at 650 and another at 720 and so on. When I am doing manually I use Expasy online tool which works pretty well. It there any way to translate all these sequence programmatically in R ? Is it possible to send my sequence to Expasy webpage using R studio and retrieve amino-acid sequences ? I am comfortable using R/Bioconductor but I can use Biopython too if there is a way to do using Biopython.

Thanks !

DNA protein Biocondutor Bippyton • 3.0k views

ADD COMMENT • link updated 4.7 years ago by Mensur Dlakic ★ 27k • written 4.7 years ago by lokraj2003 ▴ 120

0

Entering edit mode

So, you have multiple sequences with different start points relative to one another? Do you need to perform a multiple sequence alignment before translation, or has this already been done?

ADD REPLY • link 4.7 years ago by Brice Sarver ★ 3.8k

0

Entering edit mode

I am going to do selection pressure analysis. So, I will have to do multiple sequence alignment before I could actually do selection pressure analysis.

ADD REPLY • link 4.7 years ago by lokraj2003 ▴ 120

score 0 · Answer 1 · 2019-07-22

0

Entering edit mode

4.7 years ago

swbarnes2 14k

If you know what they should translate to, I'd use blastx. Then blastx will handle getting the frame right for you. You can give it a multi-fasta of input, then you have to parse the output.

ADD COMMENT • link 4.7 years ago by swbarnes2 14k

0

Entering edit mode

Actually it works, but then parsing output became tedious. Thinking of a way to parse the output.

ADD REPLY • link 4.7 years ago by lokraj2003 ▴ 120

score 0 · Answer 2 · 2019-07-22

There is a nice set of biosequence conversion tools in easel. For your purpose, this command should do the trick:

esl-translate -l 2000 sequence.fna > sequence.faa

It specifically asks for ORFs larger than 2000 residues, which is presumably what you need. However, it can find ORFs in all 6 reading frames if needed.