Question: How I can make an open reading frame finder with Python ?
0
gravatar for Kevin_Smith
4.0 years ago by
Kevin_Smith10
Kevin_Smith10 wrote:

I need to write a program to find open reading frames in a DNA sequence. The program should take as input the provided sequences in FASTA format (“sequence_A.fa” and “sequence_B.fa”), and supply as output:

(1) The sizes of the potential ORFs greater than 30 amino acids from all 3 forward reading frames. (2) The translations into protein of the ORFs. (3) The ORF does not have to begin with an ATG, but should be any sequence of nucleotides that encodes a polypeptide of >30 amino acids. (4) Output a peptide each line with this format: frame #: length_of_peptide sequence_of_peptide . I have a code snippet to set up a Python dictionary of codons in a file called “codondictionary.py” which I can copy into the program.

I will appreciate very much any help for a Python script.

homework sequence python gene • 7.1k views
ADD COMMENTlink modified 4.0 years ago by WouterDeCoster43k • written 4.0 years ago by Kevin_Smith10
2
  1. What have you tried? Post a link to your code.
  2. You can do all of this with biopython.
ADD REPLYlink written 4.0 years ago by Devon Ryan94k
1

Some questions:

  1. Why do you not consider an ATG important? I'm afraid you'll get a bunch of false positives.
  2. Why only 3 reading frames and not 6?
  3. Essentially you are looking for stretches of 90 or more nucleotides before a stop_codon and want these translated?
  4. How big is your input data?
ADD REPLYlink modified 4.0 years ago • written 4.0 years ago by WouterDeCoster43k

The ORF does not have to begin with an ATG, I just want to obtain any sequence of nucleotides that encodes a polypeptide of >30 amino acids. The size of the input is 1040 nucleotides. 3 reading frames is fine, I just want to see the ones in the forward direction. Thanks

ADD REPLYlink modified 4.0 years ago • written 4.0 years ago by Kevin_Smith10
7
gravatar for WouterDeCoster
4.0 years ago by
Belgium
WouterDeCoster43k wrote:

I have my doubts about your biological question, but here you go: Save script as e.g. ORFfinder.py and execute as python ORFfinder.py yourinput.fasta

import sys
from Bio import SeqIO
from Bio import Seq

record = SeqIO.read(open(sys.argv[1]), "fasta")
#Create three reading frames in forward direction, offset 0, 1, 2
readingframes = [Seq.translate(record.seq[i:], table='Standard', stop_symbol='*', to_stop=False, cds=False) for i in range(3)]

results = []
for frame in readingframes:
    for peptide in frame.split('*'): #Split translation over stopcodons
        if len(peptide) > 30:
            results.append(peptide)

#Use PotentialORFs.txt as output, can be changed            
#Write length and translation to file
with open('PotentialORFs.txt', 'w') as output: 
    for peptide in results:
        output.write("{}\t{}\n".format(len(peptide), peptide))
ADD COMMENTlink written 4.0 years ago by WouterDeCoster43k

I got no module named Bio. What this mean? Thanks

ADD REPLYlink written 4.0 years ago by Kevin_Smith10

Is posible to do the program without biopython ?

ADD REPLYlink written 4.0 years ago by Kevin_Smith10
1

You absolutely require biopython for this code to work.

I hope you have pip installed to install python packages:

(sudo) pip install biopython

Alternatively, have a look here: http://biopython.org/wiki/Download

ADD REPLYlink modified 4.0 years ago • written 4.0 years ago by WouterDeCoster43k
1

FYI, pip install --user biopython is usually preferable.

ADD REPLYlink written 4.0 years ago by Devon Ryan94k
1
gravatar for Daniel
4.0 years ago by
Daniel3.8k
Cardiff University
Daniel3.8k wrote:

Check out the "Python for Bioinformatics" book (look here for the section on protein translations) or Python for biologists here.

ADD COMMENTlink written 4.0 years ago by Daniel3.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1250 users visited in the last hour