tool that reports *every* ORF, including from nested starts?
1
0
Entering edit mode
7 weeks ago
sullis02 • 0

(I am defining ORF as, between an ATG start and a downstream stop codon)

Given a situation like this:

showing a ~1kb region, forward strand, 3 frame prediction of starts (green) and stops (red), for the frame 3 span between the first and second stops, the ORF finding tools I've tried all return a single long ORF. But I need to list all the possible frame 3 ORFs in that span, i.e., the ORFs that start from different ATGs but all end with the same stop. So my desired tool would would would actually report 10 ORFs for the span in question. Is there any tool that outputs this?

I have tried ORF finder (NCBI) and getorf (emboss) and orfipy . There are no setting I can find in any of those, so far, that can do what I want.

finding gene orf prediction • 207 views
0
Entering edit mode
7 weeks ago
Mensur Dlakic ★ 14k

esl_translate from the HMMer package can do what you want. The only thing you may need to specify is the minimum number of amino-acids to be considered an ORF.

0
Entering edit mode

Unfortunately, it does not. Like the other tools , esl-translate returns only the longest of a set of nested ORFs between two stops within the same frame.

$cat new_seq.fa >new_seq GACTCGGTGCTATGTTCTGAATATTTCTGACTTGCATTTTTAATGGAGATAAAATGAAGCATTTAATACATGACGTAGATGAAGACATGAATGAAACTACAGACAAACTTAACTCTTCTCTCATTCTTCCTTTCAGTAAGGACTATGAGTTCTGTTCAAATGGCGTTTATTTCTATTGTGGAAAGATGGGTTCAGGTAAGACATTTAATTTAATTCGTCATATACTCATAACAGAACGTTTAGGAAATGACTCATATTATGACCAAATCATTATATCAGCAACTTCAGACTCTATGGACTCAACAGCGAAAACATTTATGTCAAAAGTTCAAGCCTCTGTCGTTAAAGTTCCAGACAGTGAACTCATTGAATTTCTTCAACGTTACATTCGACGTAAGAGGAAATATTATGCCATCGTTGAATTTATACAGTCAGGAATGCAAAAGACTTCTGAGGAGATGGAAAGAATTATTGACAAACACCACTTACGTCAGTACTCAGGAGTTTACGATATGAAACGACTGACAAACTACATTCTATCAAAACTTTCAAAATACCCCTTCAAAAAATATCCTTCAAACACTCTGCTCGTTTGCGACGACTTCGCTGGTAAAGGTTTAGTGTCAAAACCAGACTCACCATTAGCTAATATCATTACTAAAGTCAGACATTACCACTTAACTGTAGCAATACTTATGCAAACATGGAGGTTTTTAGCTTTAAACATAAAACGTCTCATAACTGACTTCGTTATCTTTCAAGGTTTCTCACGTTATGATATTGAACTCATTTGGAAACAGTCAGGTATAACATTACCTTTTGAAGAAATTTGGGAAGCATATAAGTCTCTCATCTCTCCTCGTTCATACCTTGAGATTCATATCATGACTAATACCATTAAAGTCAAAAATATTCCATGGGAACGACCAACATTGTTTTAAAGTTTAACCTTCAATTGACTGA$ esl-translate -m --watson new_seq.fa
>orf1 source=region_of_interest coords=91..210 length=40 frame=1 desc=
MKLQTNLTLLSFFLSVRTMSSVQMAFISIVERWVQVRHLI
>orf2 source=region_of_interest coords=247..513 length=89 frame=1 desc=
MTHIMTKSLYQQLQTLWTQQRKHLCQKFKPLSLKFQTVNSLNFFNVTFDVRGNIMPSLNL
YSQECKRLLRRWKELLTNTTYVSTQEFTI
>orf3 source=region_of_interest coords=54..938 length=295 frame=3 desc=
MKHLIHDVDEDMNETTDKLNSSLILPFSKDYEFCSNGVYFYCGKMGSGKTFNLIRHILIT
ERLGNDSYYDQIIISATSDSMDSTAKTFMSKVQASVVKVPDSELIEFLQRYIRRKRKYYA
IVEFIQSGMQKTSEEMERIIDKHHLRQYSGVYDMKRLTNYILSKLSKYPFKKYPSNTLLV
CDDFAGKGLVSKPDSPLANIITKVRHYHLTVAILMQTWRFLALNIKRLITDFVIFQGFSR
YDIELIWKQSGITLPFEEIWEAYKSLISPRSYLEIHIMTNTIKVKNIPWERPTLF