Question: Gene Prediction by using Python
1
gravatar for anasjamshed1994
10 weeks ago by
anasjamshed199460 wrote:

I want to implement software(python script) to identify genic regions based on codon usage analysis, specifically codon usage log-likelihood. I have 2 files one is FASTA and another one contains information of codons

The following is the general algorithm;

1) Starting at the first base of the DNA sequence, determine the first two codons and calculate their log-likelihood of occurrence based on a provided frequency table. Record this value and do the same for codons 2 and 3. Continue until you have processed all of the triplets in the sequence.

2)Do the same as step 1 except start at base 2 of DNA sequence. This represents the second reading frame of the sequence.

3) Do the same as step 1 except start at base 3 of the DNA sequence. This represents the third reading frame of the sequence.

4) Select a sliding window size, x, to analyze the three lists of data. 50 to 100 codons is recommended size of the window.

5) For each reading frame, calculate the sum of log-likelihoods within the sliding window and select the largest of the three.

6)Slide the window over y number of codons and go to step 5. 10 codons is the recommended sliding window increment size.

7) Print out the sliding window start position and the largest likelihood sum.

I have tried this code:

from Bio import SeqIO
 import re
 import statistics
 from math import log
 import matplotlib.pyplot as plt
 import matplotlib.patches as patches

 fasta_file = "samt.fasta"
 codon_file = "codontable.txt"
 lookup = {}
 infile = open(codon_file,mode = 'r')
 infile.readline()
 # Parse the codon file and put the codons and their frequency
 # in the dictionary named 'lookup'
 for line in infile:
 # The codon is the three DNA letters to specify an amino acid
    codon =  ""
# The frequency value in the table should be dived by 1000
   frequency = ""
    lookup[codon] = frequency
    print(frequency)

But stuck after that. Plz help me

dna codon fasta • 230 views
ADD COMMENTlink modified 10 weeks ago • written 10 weeks ago by anasjamshed199460

Is this an assignment?

Also, please edit your post and format your code properly using the 101010 button. You can use backticks for inline code (`text` becomes text), or select a chunk of text and use the highlighted button to format it as a code block.
code_formatting

ADD REPLYlink written 10 weeks ago by Ram32k

this is not assignment my project

ADD REPLYlink written 10 weeks ago by anasjamshed199460

I strongly suggest you use an established gene predictor instead of creating your own unless there is a very good reason.

ADD REPLYlink written 10 weeks ago by Joe19k

i want to do this through python script

ADD REPLYlink written 10 weeks ago by anasjamshed199460

Gene prediction (at least accurate gene prediction) is harder than your post implies. If you follow your approach, while you may achieve it in python, the data you generate will probably be trash.

ADD REPLYlink written 10 weeks ago by Joe19k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1811 users visited in the last hour
_