Question: Nucleotide to amino acid conversion of multiple sequences
0
gravatar for lokraj2003
8 months ago by
lokraj200380
lokraj200380 wrote:

I have 220 nucleotide sequences. They are around 7000 bp and each of them should translate into one protein of around 2100 amino acids. There are some functions already available in R that can convert nucleotide sequence to amino acid sequence. But, these available functions like Translate in seqinr or trans from ape package require start and end position. But my sequences don't have same start position. For exampe one sequence has start position at 650 and another at 720 and so on. When I am doing manually I use Expasy online tool which works pretty well. It there any way to translate all these sequence programmatically in R ? Is it possible to send my sequence to Expasy webpage using R studio and retrieve amino-acid sequences ? I am comfortable using R/Bioconductor but I can use Biopython too if there is a way to do using Biopython.

Thanks !

ADD COMMENTlink modified 8 months ago by Mensur Dlakic4.3k • written 8 months ago by lokraj200380

So, you have multiple sequences with different start points relative to one another? Do you need to perform a multiple sequence alignment before translation, or has this already been done?

ADD REPLYlink written 8 months ago by Brice Sarver3.5k

I am going to do selection pressure analysis. So, I will have to do multiple sequence alignment before I could actually do selection pressure analysis.

ADD REPLYlink written 8 months ago by lokraj200380
0
gravatar for swbarnes2
8 months ago by
swbarnes27.5k
United States
swbarnes27.5k wrote:

If you know what they should translate to, I'd use blastx. Then blastx will handle getting the frame right for you. You can give it a multi-fasta of input, then you have to parse the output.

ADD COMMENTlink written 8 months ago by swbarnes27.5k

Actually it works, but then parsing output became tedious. Thinking of a way to parse the output.

ADD REPLYlink written 8 months ago by lokraj200380
0
gravatar for Mensur Dlakic
8 months ago by
Mensur Dlakic4.3k
USA
Mensur Dlakic4.3k wrote:

There is a nice set of biosequence conversion tools in easel. For your purpose, this command should do the trick:

esl-translate -l 2000 sequence.fna > sequence.faa

It specifically asks for ORFs larger than 2000 residues, which is presumably what you need. However, it can find ORFs in all 6 reading frames if needed.

ADD COMMENTlink written 8 months ago by Mensur Dlakic4.3k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1320 users visited in the last hour