Question: R functions that extract the ORF from a sequence
0
gravatar for peter.durr
4.1 years ago by
peter.durr0
Australia
peter.durr0 wrote:

Hi everyone

I am working within R and need to extract the open reading frame (ORF) from a number of viral sequences

somewhat to my surprise I have not yet been able to come across R functions within a package that find the ORF and readily extracts them.

can anyone point me to R functions that will do these tasks?

Thanks

 

sequence • 3.9k views
ADD COMMENTlink modified 3 months ago by hauken_heyken40 • written 4.1 years ago by peter.durr0

Why in R? There are many other possible and straightforward solutions available (bedtools, EMBOSS, etc)

ADD REPLYlink written 4.1 years ago by Israel Barrantes740

yes you are certinly correct - for sequence manipulation there are better tools

i am trying to do things in R becuase

1. of the downstream tools - especially for phylogenetics

2. i can code the total workflow into one replicable file

but.... in this case maybe R is not yet mature enough, and i will need to do the sequence manipulations outside of
R and then work on a clean alignment for the analysis

ADD REPLYlink written 4.1 years ago by peter.durr0

Are you attempting de novo prediction of all ORFs, or do you want to extract only the ORFs from known/annotated viruses?
 

ADD REPLYlink written 4.1 years ago by Joseph Pearson450

I am extracting from known viruses - actually segments of influenza viruses

the challenge arises is when i download a lot of them from Genbank, the segments will be of variable lengths

the five starting scenarios are:

1. complete segment length (about 1741 nt for segment 4)

2. complete coding sequence (about 1704 nt for segment 4)

3. missing regions to the left - with no start codon

4. missing regions to the right - with no stop codon

5. missing left and right - no start and stop codon

i am hoping to develop a workflow that can classify the sequences in to the 5 groups - and i was hoping i could build on existing code

thanks

 

ADD REPLYlink modified 4.1 years ago • written 4.1 years ago by peter.durr0

This might be a good starting point:

http://a-little-book-of-r-for-bioinformatics.readthedocs.org/en/latest/src/chapter7.html

The SequinR package has a number of functions that deal with the prediction of reading frames:

https://cran.r-project.org/web/packages/seqinr/seqinr.pdf

 

 

 

ADD REPLYlink modified 4.1 years ago • written 4.1 years ago by Joseph Hughes2.7k
0
gravatar for hauken_heyken
3 months ago by
hauken_heyken40 wrote:

The R package ORFik in Bioconductor has all you need, implemented in C++ and even takes circular genomes.

ADD COMMENTlink written 3 months ago by hauken_heyken40
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 3572 users visited in the last hour