Question: Python Script to to convert orf gene into protein
0
gravatar for anasjamshed1994
11 months ago by
anasjamshed199420 wrote:

I made a script with the help of rody blog but i also want to convert longest orf gene in CDS

I tried this code:

Genome_Seq = open("C:\\Users\\cnn\\Desktop\\MS(BI)\\Bioinformatics 1\\Droso.txt") 

Genetic_Codes = { 
    'ATA':'I', 'ATC':'I', 'ATT':'I', 'ATG':'M', 
    'ACA':'T', 'ACC':'T', 'ACG':'T', 'ACT':'T', 
    'AAC':'N', 'AAT':'N', 'AAA':'K', 'AAG':'K', 
    'AGC':'S', 'AGT':'S', 'AGA':'R', 'AGG':'R', 
    'CTA':'L', 'CTC':'L', 'CTG':'L', 'CTT':'L', 
    'CCA':'P', 'CCC':'P', 'CCG':'P', 'CCT':'P', 
    'CAC':'H', 'CAT':'H', 'CAA':'Q', 'CAG':'Q', 
    'CGA':'R', 'CGC':'R', 'CGG':'R', 'CGT':'R', 
    'GTA':'V', 'GTC':'V', 'GTG':'V', 'GTT':'V', 
    'GCA':'A', 'GCC':'A', 'GCG':'A', 'GCT':'A', 
    'GAC':'D', 'GAT':'D', 'GAA':'E', 'GAG':'E', 
    'GGA':'G', 'GGC':'G', 'GGG':'G', 'GGT':'G', 
    'TCA':'S', 'TCC':'S', 'TCG':'S', 'TCT':'S', 
    'TTC':'F', 'TTT':'F', 'TTA':'L', 'TTG':'L', 
    'TAC':'Y', 'TAT':'Y', 'TAA':'_', 'TAG':'_', 
    'TGC':'C', 'TGT':'C', 'TGA':'_', 'TGG':'W'} 

def translate_dna(Genome_Seq): 
    """Return the translation of a DNA sequence"""
    Genome_Seq=Genome_Seq.read()
    last_codon_start = len(Genome_Seq) - 2 
    protein = "" 

    # for each codon in the dna, append an amino acid to the protein
    for start in range(0,last_codon_start,3): 
        codon = Genome_Seq[start:start+3] 
        amino_acid = Genetic_Codes.get(Genome_Seq.upper(),'X') 
        protein = protein + amino_acid

    return protein 

#Defining and Declaring of Function
def orfFINDER(dna,frame):

    start_codon = ['ATG']
    stop_codons = ['TGA', 'TAG', 'TAA']
    start_positions = []
    stop_positions = []
    num_starts=0
    num_stops=0


    for i in range(frame-1,len(dna),3):
        codon=dna[i:i+3]
        if codon in start_codon:
            start_positions += str(i+1).splitlines()
        if codon in stop_codons:
            stop_positions += str(i+1).splitlines()

    for line in stop_positions:
        num_stops += 1

    for line in start_positions:
        num_starts += 1

    orffound = {}

    if num_stops >=1 and num_starts >=1: #first statment: the number of stop codons and start condos are greater than or equal to 1;

        orfs = True
        stop_before = 0
        start_before = 0

        if num_starts > num_stops:
            num_runs = num_starts
        if num_stops > num_starts:
            num_runs = num_stops
        if num_starts == num_stops:
            num_runs = num_starts

        position_stop_previous = 0
        position_start_previous = 0
        counter = 0

        for position_stop in stop_positions:

            position_stop = int(position_stop.rstrip()) + 2

            for position_start in start_positions:

                position_start = position_start.rstrip()

                if int(position_start) < int(position_stop) and int(position_stop) > int(position_stop_previous) and int(position_start) > int(position_stop_previous):

                    counter += 1
                    nameorf = "ORF"+str(counter)
                    position_stop_previous += int(position_stop) - int(position_stop_previous)
                    position_start_previous += int(position_start) - int(position_start_previous)
                    sizeorf = int(position_stop) - int(position_start) + 1

                    orffound[nameorf] = position_start,position_stop,sizeorf,frame

                else:

                    pass

    else:

        orfs = False

    return orffound
#FUNCTION END


 #READ FASTA FILE AND SAVE HEADERS AND SEQUENCES IN A DICTIONARY
Genome_Seqs={}

for Line in Genome_Seq:
    Line = Line.rstrip()
    if Line[0] == '>':
        Words=Line.split() 
        Name=Words[0][1:]
        Genome_Seqs[Name]=''
    else:
        Genome_Seqs[Name] = Genome_Seqs[Name] + Line





#DEFINE FRAME TO FIND ORF
#if frame = 1, start from the first position in the sequence
frame= int(input("Enter Frame Number: "))


#EXECUTE THE ORFFINDER FUNCTION
for i in Genome_Seqs.items():
    header= i[0]
    seq = i[1]
    orf = orfFINDER(seq,frame)
    print("\nThe Total ORF Found in frame" ,frame, "is : " ,len(orf) ,'\n\n' ) 
    for i in orf.items():
        numorf=i[0]
        startorf=orf[numorf][0]
        stoporf=orf[numorf][1]
        lengthorf=orf[numorf][2]
        frameorf=orf[numorf][3] 
        print (numorf,"starts from",startorf,"bp and stop at",stoporf,"bp , length:",lengthorf,", Frame:",frameorf,'\n')
        if lengthorf >= 1503:
            print(numorf , "in frame" ,frameorf, "is legimate ORF as it contains more than 500 Base Pairs b/w start and stop codon and also contains more than 500 amino acids so we predict that it would be a gene \n","\nlength of Predicted gene: ",lengthorf,"bp")

Genome_Seq.close()

I stored genetic codes in dictionary but i dont know how to call orf converted into protein through codon

ADD COMMENTlink modified 11 months ago by RamRS24k • written 11 months ago by anasjamshed199420

Please use the formatting bar (especially the code option) to present your post better. I've done it for you this time.
code_formatting

ADD REPLYlink written 11 months ago by RamRS24k

Why not just use the translate method of BioPython?

ADD REPLYlink written 11 months ago by Joe14k

Sir i want to translate only orf which i calculate through script

ADD REPLYlink written 11 months ago by anasjamshed199420

You can script BioPython. Please make this question more straightforward than your last.

What good reason is there not to use BioPython in this instance?

ADD REPLYlink written 11 months ago by Joe14k

how to use biopython in it? in this part of question i already calculated orf as shown in the code but after that i want to calculate orf gene cds with help of genetic code which i also stored in dictionary

ADD REPLYlink written 11 months ago by anasjamshed199420

Have you searched for biopython and taken a look at the cookbook and tutorial?

calculate orf gene cds with help of genetic code which i also stored in dictionary

You'll have to explain this a little better and take some more time to formulate your sentences. It is currently hard to read.

ADD REPLYlink written 11 months ago by WouterDeCoster41k

Sir in the given code i first calculate orf then i stored genetic codes in dictionary as shown in starting because i want to convert the orf 1 nucleotide sequence into protein by using genetic code I also tried biopython but translate function gave me a wrong output

ADD REPLYlink written 11 months ago by anasjamshed199420

We understand the code you already have - that is not the issue. How do you know its wrong? Have you tried using different translation tables?

ADD REPLYlink written 11 months ago by Joe14k

sir i made translation table by myself in code as you can see in the beginning.

ADD REPLYlink written 11 months ago by anasjamshed199420

I also tried biopython but translate function gave me a wrong output

What makes you think the output is wrong? Biopython is a very mature package, I'd be really surprised if the translate function is wrong.

ADD REPLYlink written 11 months ago by WouterDeCoster41k

biopython translate function not works well for input fasta files

ADD REPLYlink written 11 months ago by anasjamshed199420
1

Well that’s demonstrably not true. Fasta files are perfectly valid inputs to BioPython. The translate function is a method of SeqRecords so the input file type has very little relevance.

You should show what biopython code you tried because I’m willing to bet you simply aren’t using it correctly.

ADD REPLYlink written 11 months ago by Joe14k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1766 users visited in the last hour