Question

Program For Finding Orf And Corresponding Reading Frame

2

Entering edit mode

13.2 years ago

Woa ★ 2.9k

Hello All,

Can somebody recommend a standalone program that predicts the Open reading frames(ORF) from all six reading frames of a DNA sequence and also reports from which frame the ORFs are derived from? I think It can be parsed from teh FASTA header of EMBOSS SIXPACK's output. Please let me know if there are any better alternatives.

Thanks in advance

orf • 12k views

ADD COMMENT • link updated 13.2 years ago by Michael 54k • written 13.2 years ago by Woa ★ 2.9k

0

Entering edit mode

This question could be a candidate for another "code golf" isn't it ? :-)

ADD REPLY • link 13.2 years ago by Pierre Lindenbaum 161k

0

Entering edit mode

Sure, just need to open a new question and name it 'Code golf: Finding ORFs'. Want to add one? If you don't, I sure will ;)

ADD REPLY • link 13.2 years ago by Eric Normandeau 11k

0

Entering edit mode

Why is sixpack not suitable? Do you want the ORF DNA seq in FASTA with frame info in header?

ADD REPLY • link 13.2 years ago by Jarretinha 3.4k

0

Entering edit mode

Thanks all for the answers. I think SIXPACK is OK with me as it gives the ORF as well the frame info. in the FASTA header, as follows:

X13776_5_ORF15 Translation of X13776 in frame 5, ORF 15, threshold 1, 19aa QPTRNRTPRLRMKSSAHSR

However though GETORF gives the ORFS in all frames, the frame information is missing:

V00294_3 [465 - 49] (REVERSE SENSE) E. coli laci gene (codes for the lac repressor). RRNISAGSFHSNGILVIQRIVNDQPTDALREKIVHRRFTGFDAASFYHRHHHAGTQLIGA RFNRRDNLRRRVQGQTGGGNANQQRLFARQLLCHAVGNVIQLRHRRFHFFPRFRRNVAGL VHHAGNGLIRDTGILCDIV

ADD REPLY • link 13.2 years ago by Woa ★ 2.9k

0

Entering edit mode

Well, it's true, it is not given, but it is redundant. The reading frame is trivially computed from the the start,stop position. The difference between Sixpack and getORF is that sixpack is for prettyprinting short sequences while getOrf is for getting the Orf sequences of e.g. whole genome.

ADD REPLY • link 13.2 years ago by Michael 54k

0

Entering edit mode

Thanks Michael, Can it be just computed from the start position like start_position % 3 =1->first frame, 2->second and 0->Third frame? The problem with SIXPACK is that it calculates for one sequence at a time and I've several thousands of them to calculate. Rather than creating thousands of files may be I'll be using GETORF and my FASTA library.

ADD REPLY • link 13.2 years ago by Woa ★ 2.9k

0

Entering edit mode

I would use the stop position, see my edit in my answer.

ADD REPLY • link 13.2 years ago by Michael 54k

Ram · Answer 1 · 2011-02-23

Or use getORF from the EMBOSS package, available as an executable, web-service, or website.

There is nothing fancy about ORF finding, they don't need to be predicted, genes are predicted, they are simply found, as either any sequence that does not contain a stop codon and ends with a stop codon, or alternatively any sequence between a start and stop codon (in frame). The orf finding therefore automatically takes all 6 frames into account. getOrf supports both modes. Make sure to select the appropriate genetic code.

Edit: one simple way to calculate the frame in pseudo code given start and stop:

 if ( + strand, use the info in the header) 
   # start < stop would also work except for circular genome with orf spanning origin
   frame := (stop %modulo% 3) + 1
 else 
   frame := - (stop %modulo% 3)
   # actually with minus strand I am not 100% sure if that is the best way

score 3 · Answer 2 · 2011-02-23

3

Entering edit mode

13.2 years ago

Larry_Parnell 16k

Try ORFinder at http://www.bioinformatics.org/sms2/orf_find.html. This seems to give what you're requesting but I don't have test sequences handy to run a check.

ADD COMMENT • link 13.2 years ago by Larry_Parnell 16k

Ram · Answer 3 · 2011-02-23

1

Entering edit mode

13.2 years ago

Louis Letourneau ▴ 820

It might not be exactly what you want, but to find genes in prokaryotic DNA, glimmer3 works wonders. On eukaryotes...not so much.

Actually, this page has a pretty nifty list of candidates.

ADD COMMENT • link updated 4.6 years ago by Ram 43k • written 13.2 years ago by Louis Letourneau ▴ 820

0

Entering edit mode

It's sort of a bit more than requested, glimmer is a gene prediction program, only a small fraction of all ORFs are really protein coding

ADD REPLY • link 13.2 years ago by Michael 54k

score 1 · Answer 4 · 2011-02-24

Orf finder will give all six possibilities of protein translations from your dna sequence. they are probabilities for coding. bt if you want to know the exact protein coding region from your gene sequence do a blastx search from nr or swissprot database.

http://blast.ncbi.nlm.nih.gov/Blast.cgi?PROGRAM=blastx&BLAST_PROGRAMS=blastx&PAGE_TYPE=BlastSearch&SHOW_DEFAULTS=on&LINK_LOC=blasthome

score 0 · Answer 5 · 2011-02-24

0

Entering edit mode

13.2 years ago

Elena ▴ 250

you can use GeneScan: A context independent gene finding program

ADD COMMENT • link 13.2 years ago by Elena ▴ 250

0

Entering edit mode

no, this is a gene prediction prog not a ORF finder

ADD REPLY • link 13.2 years ago by Michael 54k