Nucleotide to amino acid conversion of multiple sequences
2
0
Entering edit mode
4.7 years ago
lokraj2003 ▴ 120

I have 220 nucleotide sequences. They are around 7000 bp and each of them should translate into one protein of around 2100 amino acids. There are some functions already available in R that can convert nucleotide sequence to amino acid sequence. But, these available functions like Translate in seqinr or trans from ape package require start and end position. But my sequences don't have same start position. For exampe one sequence has start position at 650 and another at 720 and so on. When I am doing manually I use Expasy online tool which works pretty well. It there any way to translate all these sequence programmatically in R ? Is it possible to send my sequence to Expasy webpage using R studio and retrieve amino-acid sequences ? I am comfortable using R/Bioconductor but I can use Biopython too if there is a way to do using Biopython.

Thanks !

DNA protein Biocondutor Bippyton • 3.0k views
ADD COMMENT
0
Entering edit mode

So, you have multiple sequences with different start points relative to one another? Do you need to perform a multiple sequence alignment before translation, or has this already been done?

ADD REPLY
0
Entering edit mode

I am going to do selection pressure analysis. So, I will have to do multiple sequence alignment before I could actually do selection pressure analysis.

ADD REPLY
0
Entering edit mode
4.7 years ago

If you know what they should translate to, I'd use blastx. Then blastx will handle getting the frame right for you. You can give it a multi-fasta of input, then you have to parse the output.

ADD COMMENT
0
Entering edit mode

Actually it works, but then parsing output became tedious. Thinking of a way to parse the output.

ADD REPLY
0
Entering edit mode
4.7 years ago
Mensur Dlakic ★ 27k

There is a nice set of biosequence conversion tools in easel. For your purpose, this command should do the trick:

esl-translate -l 2000 sequence.fna > sequence.faa

It specifically asks for ORFs larger than 2000 residues, which is presumably what you need. However, it can find ORFs in all 6 reading frames if needed.

ADD COMMENT

Login before adding your answer.

Traffic: 1571 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6