Question: ORF finder script
0
gravatar for ahm3dhany
2.9 years ago by
ahm3dhany10
ahm3dhany10 wrote:

I wrote a basic bash script to find the ORF (i.e. open reading frame) in a given nucleotide sequence and I need to know if I done it right.

ORF_finder.sh :

#!/bin/bash

var=$1
grep -Eo --color=auto 'ATG(...)*T(A(A|G)|GA)' $1

an example for the usage:

~$ sequence="CTGGATGATCCTCTAACCGCGCAAACGAGG"
~$ echo $sequence | ./ORF_finder.sh
ATGATCCTCTAA

another example:

~$ sequence="ATAATCGGCCTCGACATCCTCCGCCACGAAAGGACTGTCCCCAATCCCGAAGGCCGCAGAGCGCTGCACATACTAGGGTCAGCTAAGGTACCTCTCATGCGAGGACACGCGATGTGGCATCTAGGCGGCGTTAGAAAATTATTCGAGGCGGCCTACCGTCTTAACCGTTAAATACAAGCATGGGAAGGCAGAGCGAAAATAAAATTGCCCGCGCCTCACTACCTGCCGTCTCGTAACACTTAGCTCTAAAATAGAGTAAGCTCGGCCCCCAGTCCAAGGCACGTAAGGATGTATCGAGGCTCAAAAGACTCGCTGATCGTACCGGTCTCGTGCGTAAAAAGGCAGCAGAACTATGCTTGACTATCCATACGTCTCCATCGTTCCTGCTGATTCGTCGCGAATTGGCGCGGTTACTTAGTCTCCGGGCTGTCCGGTCGGGCTAGGTGATGCCTGTCCCTAAGGTGAATCAAGAAATCCTCAAAACTGCATAATCACGTGTT"
~$ echo $sequence | ./ORF_finder.sh
ATGCGAGGACACGCGATGTGGCATCTAGGCGGCGTTAGAAAATTATTCGAGGCGGCCTACCGTCTTAACCGTTAAATACAAGCATGGGAAGGCAGAGCGAAAATAAAATTGCCCGCGCCTCACTACCTGCCGTCTCGTAACACTTAGCTCTAAAATAGAGTAAGCTCGGCCCCCAGTCCAAGGCACGTAAGGATGTATCGAGGCTCAAAAGACTCGCTGATCGTACCGGTCTCGTGCGTAAAAAGGCAGCAGAACTATGCTTGACTATCCATACGTCTCCATCGTTCCTGCTGATTCGTCGCGAATTGGCGCGGTTACTTAGTCTCCGGGCTGTCCGGTCGGGCTAGGTGATGCCTGTCCCTAAGGTGA

Is finding ORF that simple? or it's more complicated than that.

ADD COMMENTlink modified 2.9 years ago • written 2.9 years ago by ahm3dhany10
2

If you only want ORFs from the forward frames and which are not supported by any evidence, then, yes, it is that simple.

ADD REPLYlink written 2.9 years ago by cschu1811.9k
2

Yup, if your definition of ORF is sufficient that all you care about is it starts with an ATG (they don't always) and ends with one of the stop codons some multiple of 3 away, then yeah, it's as simple as that.

More sophisticated ORF finders will consider the 6 possible reading frames (forward and reverse), as well as possibly include a minimum length, and some filtering for sequence complexity etc.

It looks like this will only work if you chomp the newlines in a fasta too beforehand - I think...

ADD REPLYlink modified 2.9 years ago • written 2.9 years ago by Joe15k

thanks.. could you please provide me with an article or paper or anything that elaborate the other details I neglected.

ADD REPLYlink written 2.9 years ago by ahm3dhany10
2

One of the most sophisticated tools for ORF detection is GLIMMER: https://ccb.jhu.edu/papers/glimmer2.pdf

That is even more complex than just the things I mentioned though. It implements Markov models. NCBI's ORF finder is slightly more complex than basic string searching, however I don't know exactly what the code is doing. You might be able to find it on the web somewhere to download:https://www.ncbi.nlm.nih.gov/orffinder

ADD REPLYlink written 2.9 years ago by Joe15k

In addition, only one ORF per fragment...

ADD REPLYlink written 2.9 years ago by WouterDeCoster42k

First of all, thank you for your answer.. do you mean by "forward frames" that if each nucleotide of the sequence flipped (i.e. A->T, T->A, C->G and G->C) ?

ADD REPLYlink written 2.9 years ago by ahm3dhany10
1

The reverse complement of the sequence.

ADD REPLYlink written 2.9 years ago by WouterDeCoster42k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 746 users visited in the last hour