Take out the GENES from DNA sequence while using FRMAES.
0
0
Entering edit mode
5.5 years ago

I want to make a Python Program in which a DNA sequence is given in a text file. It has more than 9000 characters. I have to cut the sequence in 3 characters so our frame reads from 1 to 3, then 4 to 6, then 7 to 9, which is called as codons.

For Example the sequence is

ACCTGCCTCTTACGAGGCGACACTCCACCATGGATCACTCCCCTGTGAGGAACTACTGTCTTCACGCAGA

then I have to cut it in 3 characters. Which I have already done it. My question is how can I take out the GENE sequence from the given DNA? GENE sequence starts from ATG and end it on TAG or TAA or TGA.

It is easy to do if I use Regular Expression. But the problem is if you look at the above sequence the ATG is coming from 30th position to 32nd. While our frame reads from 1 to 3 then 4 to 6. In this case when it reaches to 28th to 30th, it doesn't make ATG.

Can anyone understand my problem and please help me? I'm sharing my code now:

import numpy as np
import pandas as pd
import re
from pathlib import Path
dna = Path('C:/Users/abdul/Downloads/Compressed/MAJU/HCV-PK1-sequence - edited.txt').read_text()
l = [c for c in dna if c!='\n']
r = len(l)
for x in range(0,r,3):
    y=x+3
    codon = l[x:y]
    a = ''.join(codon)
    print(a)
if(a == re.findall('ATG(...)+?(TAG|TAA|TGA)', dna)):
    print("Yes")
DNA to GENES in Python Python 3x BioPython • 979 views
ADD COMMENT
0
Entering edit mode

Not sure if you're aware of this, but you're looking for [open] reading frames. There are a lot of tools that already do this for you, is there a reason you're trying to write a new tool?

Check out EMBOSS' getorf - that's one of the tools I've used to detect open reading frames.

ADD REPLY
0
Entering edit mode

I know Expassy do it for you. But I have to write a code for it as our teacher gave us the program to do it. I have done the first part but I'm stuck at the 2nd part. I'm trying my best to do it. Hope you'll give me some ideas

ADD REPLY
0
Entering edit mode

I think you're stuck because you're only reading one frame of the sequence. If you were to do a little research (just Googling) on reading frames, you'd see the one piece of knowledge that will solve your problem.

ADD REPLY

Login before adding your answer.

Traffic: 1930 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6