Question: Program For Finding Orf And Corresponding Reading Frame
2
gravatar for Woa
8.4 years ago by
Woa2.7k
United States
Woa2.7k wrote:

Hello All,

Can somebody recommend a standalone program that predicts the Open reading frames(ORF) from all six reading frames of a DNA sequence and also reports from which frame the ORFs are derived from? I think It can be parsed from teh FASTA header of EMBOSS SIXPACK's output. Please let me know if there are any better alternatives.

Thanks in advance

orf • 8.1k views
ADD COMMENTlink modified 8.4 years ago by Michael Dondrup46k • written 8.4 years ago by Woa2.7k

This question could be a candidate for another "code golf" isn't it ? :-)

ADD REPLYlink written 8.4 years ago by Pierre Lindenbaum121k

Sure, just need to open a new question and name it 'Code golf: Finding ORFs'. Want to add one? If you don't, I sure will ;)

ADD REPLYlink written 8.4 years ago by Eric Normandeau10k

Why is sixpack not suitable? Do you want the ORF DNA seq in FASTA with frame info in header?

ADD REPLYlink written 8.4 years ago by Jarretinha3.3k

Thanks all for the answers. I think SIXPACK is OK with me as it gives the ORF as well the frame info. in the FASTA header, as follows:

X13776_5_ORF15 Translation of X13776 in frame 5, ORF 15, threshold 1, 19aa QPTRNRTPRLRMKSSAHSR

However though GETORF gives the ORFS in all frames, the frame information is missing:

V00294_3 [465 - 49] (REVERSE SENSE) E. coli laci gene (codes for the lac repressor). RRNISAGSFHSNGILVIQRIVNDQPTDALREKIVHRRFTGFDAASFYHRHHHAGTQLIGA RFNRRDNLRRRVQGQTGGGNANQQRLFARQLLCHAVGNVIQLRHRRFHFFPRFRRNVAGL VHHAGNGLIRDTGILCDIV

ADD REPLYlink written 8.4 years ago by Woa2.7k

Well, it's true, it is not given, but it is redundant. The reading frame is trivially computed from the the start,stop position. The difference between Sixpack and getORF is that sixpack is for prettyprinting short sequences while getOrf is for getting the Orf sequences of e.g. whole genome.

ADD REPLYlink written 8.4 years ago by Michael Dondrup46k

Thanks Michael, Can it be just computed from the start position like start_position % 3 =1->first frame, 2->second and 0->Third frame? The problem with SIXPACK is that it calculates for one sequence at a time and I've several thousands of them to calculate. Rather than creating thousands of files may be I'll be using GETORF and my FASTA library.

ADD REPLYlink written 8.4 years ago by Woa2.7k

I would use the stop position, see my edit in my answer.

ADD REPLYlink written 8.4 years ago by Michael Dondrup46k
4
gravatar for Michael Dondrup
8.4 years ago by
Bergen, Norway
Michael Dondrup46k wrote:

Or use getORF from the EMBOSS package, available as an executable, web-service, or website. There is nothing fancy about ORF finding, they don't need to be predicted, genes are predicted, they are simply found, as either any sequence that does not contain a stop codon and ends with a stop codon, or alternatively any sequence between a start and stop codon (in frame). The orf finding therefore automatically takes all 6 frames into account. getOrf supports both modes. Make sure to select the appropriate genetic code.

Edit: one simple way to calculate the frame in pseudo code given start and stop:

 if ( + strand, use the info in the header) 
   # start < stop would also work except for circular genome with orf spanning origin
   frame := (stop %modulo% 3) + 1
 else 
   frame := - (stop %modulo% 3)
   # actually with minus strand I am not 100% sure if that is the best way
ADD COMMENTlink modified 8.4 years ago • written 8.4 years ago by Michael Dondrup46k

I think pretty much the same and use scripts of my own, including selenocysteine alternatives. But thing can get quite tricky if you're using eukaryotic genomic DNA.

ADD REPLYlink written 8.4 years ago by Jarretinha3.3k
3
gravatar for Larry_Parnell
8.4 years ago by
Larry_Parnell16k
Boston, MA USA
Larry_Parnell16k wrote:

Try ORFinder at http://www.bioinformatics.org/sms2/orf_find.html. This seems to give what you're requesting but I don't have test sequences handy to run a check.

ADD COMMENTlink written 8.4 years ago by Larry_Parnell16k
1
gravatar for Louis Letourneau
8.4 years ago by
Montreal
Louis Letourneau800 wrote:

It might not be exactly what you want, but to find genes in prokaryotic DNA, glimmer3 works wonders. On eukaryotes...not so much.

Actually, this page as a pretty nifty list of candidates: http://molbiol-tools.ca/Translation.htm

ADD COMMENTlink written 8.4 years ago by Louis Letourneau800

It's sort of a bit more than requested, glimmer is a gene prediction program, only a small fraction of all ORFs are really protein coding

ADD REPLYlink written 8.4 years ago by Michael Dondrup46k
1
gravatar for Anuraj Nayarisseri
8.4 years ago by
Indore
Anuraj Nayarisseri740 wrote:

Orf finder will give all six possibilities of protein translations from your dna sequence. they are probabilities for coding. bt if you want to know the exact protein coding region from your gene sequence do a blastx search from nr or swissprot database.

http://blast.ncbi.nlm.nih.gov/Blast.cgi?PROGRAM=blastx&BLAST_PROGRAMS=blastx&PAGE_TYPE=BlastSearch&SHOW_DEFAULTS=on&LINK_LOC=blasthome

ADD COMMENTlink written 8.4 years ago by Anuraj Nayarisseri740
0
gravatar for Elena
8.4 years ago by
Elena240
Elena240 wrote:

you can use GeneScan: A context independent gene finding program

ADD COMMENTlink written 8.4 years ago by Elena240

no, this is a gene prediction prog not a ORF finder

ADD REPLYlink written 8.4 years ago by Michael Dondrup46k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1605 users visited in the last hour