Question: Generate Random Dna Sequence Data With Equal Base Frequencies python
0
gravatar for elisheva
2.4 years ago by
elisheva70
Israel
elisheva70 wrote:

Hello everybody!! I have some questions to ask: 1.I have to generate random dna sequence, length: 20KB with equal base frequency on python. I tried to use this function:

def dna(length):
    DNA = ""
    for i in range(length):
        DNA += choice('atcg')
    return DNA

But it doesn't return equal frequency for all the bases. Is there is any way to do it? (not too complicated...)

2.I have to calculate the frequency of all the bases from a given file. But I'v got a huge file so I have to split it. How can I split the file, send it to a function that calculate frequency (I'v already written it) and return the real frequency?

Thanks!!!

sequence • 1.8k views
ADD COMMENTlink modified 2.4 years ago • written 2.4 years ago by elisheva70

How did you assess that your function didn't return equal frequencies?

What is "huge" in your file? Does it contain one enormous sequence or multiple sequences? How is your function written?

ADD REPLYlink modified 2.4 years ago • written 2.4 years ago by WouterDeCoster38k

About the second question my file contains only one sequence (human's chromosome) It does'nt matter how my function written. The problem is how to split the file correctly. But anyway this is my function:

def bases_freq (dna_seq_file):
    freq = {} #Creat empty dictionary
    nuc = ['a','t','c','g'] #Initializes the list with all the nucleotides 
    #Count the frequency of the nucleotides in the sequence
    for i in range(len(nuc)):
        freq[nuc[i]] = (str.count(dna_seq_file,nuc[i]))*1.0/len(dna_seq_file)
    freq['gc'] = freq['g'] + freq['c'] #Add "gc" content
    return freq
ADD REPLYlink modified 2.4 years ago by WouterDeCoster38k • written 2.4 years ago by elisheva70

Thank you so much!!! Can anybody explain me the second question?

ADD REPLYlink written 2.4 years ago by elisheva70

Please use ADD COMMENT or ADD REPLY to answer to earlier posts, as such this thread remains logically structured and easy to follow.

ADD REPLYlink modified 2.4 years ago • written 2.4 years ago by WouterDeCoster38k
2
gravatar for Steven Lakin
2.4 years ago by
Steven Lakin1.4k
Fort Collins, CO, USA
Steven Lakin1.4k wrote:

"Random DNA sequence" and "equal base frequency" are two different concepts. If you for sure want equal base frequency but you want them in a randomized order, you should generate a string with 5000 A, 5000 C, 5000 G, and 5000 T and then randomly shuffle it using the random module:

import random
dna_list = [x for x in ''.join([ 'ACGT' for i in range(5000)])]
random.shuffle(dna_list)
result = ''.join(dna_list)
ADD COMMENTlink written 2.4 years ago by Steven Lakin1.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 830 users visited in the last hour