tool that reports *every* ORF, including from nested starts?
2
3
Entering edit mode
2.6 years ago
sullis02 ▴ 40

(I am defining ORF as, between an ATG start and a downstream stop codon)

Given a situation like this: enter image description here

showing a ~1kb region, forward strand, 3 frame prediction of starts (green) and stops (red), for the frame 3 span between the first and second stops, the ORF finding tools I've tried all return a single long ORF. But I need to list all the possible frame 3 ORFs in that span, i.e., the ORFs that start from different ATGs but all end with the same stop. So my desired tool would would would actually report 10 ORFs for the span in question. Is there any tool that outputs this?

I have tried ORF finder (NCBI) and getorf (emboss) and orfipy . There are no setting I can find in any of those, so far, that can do what I want.

finding gene orf prediction • 1.4k views
ADD COMMENT
1
Entering edit mode
17 months ago
sullis02 ▴ 40

I've noticed the same thing. It's incredible to me that none of the ORF detectors I've tried report all of the ORFs that have the same in frame stop codon but different start codons. They each return only one ORF for a given strand and stop codon. (getorf seems to report the shortest, while ORFfinder reports the longest; AMIGene also seems to report the longest)

ADD COMMENT
0
Entering edit mode

Have you had any success in finding other tools since?

ADD REPLY
0
Entering edit mode
2.6 years ago
Mensur Dlakic ★ 27k

esl_translate from the HMMer package can do what you want. The only thing you may need to specify is the minimum number of amino-acids to be considered an ORF.

ADD COMMENT
0
Entering edit mode

Unfortunately, it does not. Like the other tools , esl-translate returns only the longest of a set of nested ORFs between two stops within the same frame.

$ cat new_seq.fa
>new_seq
GACTCGGTGCTATGTTCTGAATATTTCTGACTTGCATTTTTAATGGAGATAAAATGAAGCATTTAATACATGACGTAGATGAAGACATGAATGAAACTACAGACAAACTTAACTCTTCTCTCATTCTTCCTTTCAGTAAGGACTATGAGTTCTGTTCAAATGGCGTTTATTTCTATTGTGGAAAGATGGGTTCAGGTAAGACATTTAATTTAATTCGTCATATACTCATAACAGAACGTTTAGGAAATGACTCATATTATGACCAAATCATTATATCAGCAACTTCAGACTCTATGGACTCAACAGCGAAAACATTTATGTCAAAAGTTCAAGCCTCTGTCGTTAAAGTTCCAGACAGTGAACTCATTGAATTTCTTCAACGTTACATTCGACGTAAGAGGAAATATTATGCCATCGTTGAATTTATACAGTCAGGAATGCAAAAGACTTCTGAGGAGATGGAAAGAATTATTGACAAACACCACTTACGTCAGTACTCAGGAGTTTACGATATGAAACGACTGACAAACTACATTCTATCAAAACTTTCAAAATACCCCTTCAAAAAATATCCTTCAAACACTCTGCTCGTTTGCGACGACTTCGCTGGTAAAGGTTTAGTGTCAAAACCAGACTCACCATTAGCTAATATCATTACTAAAGTCAGACATTACCACTTAACTGTAGCAATACTTATGCAAACATGGAGGTTTTTAGCTTTAAACATAAAACGTCTCATAACTGACTTCGTTATCTTTCAAGGTTTCTCACGTTATGATATTGAACTCATTTGGAAACAGTCAGGTATAACATTACCTTTTGAAGAAATTTGGGAAGCATATAAGTCTCTCATCTCTCCTCGTTCATACCTTGAGATTCATATCATGACTAATACCATTAAAGTCAAAAATATTCCATGGGAACGACCAACATTGTTTTAAAGTTTAACCTTCAATTGACTGA


$ esl-translate -m --watson new_seq.fa
>orf1 source=region_of_interest coords=91..210 length=40 frame=1 desc=
MKLQTNLTLLSFFLSVRTMSSVQMAFISIVERWVQVRHLI
>orf2 source=region_of_interest coords=247..513 length=89 frame=1 desc=
MTHIMTKSLYQQLQTLWTQQRKHLCQKFKPLSLKFQTVNSLNFFNVTFDVRGNIMPSLNL
YSQECKRLLRRWKELLTNTTYVSTQEFTI
>orf3 source=region_of_interest coords=54..938 length=295 frame=3 desc=
MKHLIHDVDEDMNETTDKLNSSLILPFSKDYEFCSNGVYFYCGKMGSGKTFNLIRHILIT
ERLGNDSYYDQIIISATSDSMDSTAKTFMSKVQASVVKVPDSELIEFLQRYIRRKRKYYA
IVEFIQSGMQKTSEEMERIIDKHHLRQYSGVYDMKRLTNYILSKLSKYPFKKYPSNTLLV
CDDFAGKGLVSKPDSPLANIITKVRHYHLTVAILMQTWRFLALNIKRLITDFVIFQGFSR
YDIELIWKQSGITLPFEEIWEAYKSLISPRSYLEIHIMTNTIKVKNIPWERPTLF
ADD REPLY

Login before adding your answer.

Traffic: 2745 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6