Question: Generate Random Dna Sequence Data With Equal Base Frequencies python
0
3.9 years ago by
elisheva100
Israel
elisheva100 wrote:

Hello everybody!! I have some questions to ask: 1.I have to generate random dna sequence, length: 20KB with equal base frequency on python. I tried to use this function:

``````def dna(length):
DNA = ""
for i in range(length):
DNA += choice('atcg')
return DNA
``````

But it doesn't return equal frequency for all the bases. Is there is any way to do it? (not too complicated...)

2.I have to calculate the frequency of all the bases from a given file. But I'v got a huge file so I have to split it. How can I split the file, send it to a function that calculate frequency (I'v already written it) and return the real frequency?

Thanks!!!

sequence • 2.6k views
modified 3.9 years ago • written 3.9 years ago by elisheva100

How did you assess that your function didn't return equal frequencies?

What is "huge" in your file? Does it contain one enormous sequence or multiple sequences? How is your function written?

About the second question my file contains only one sequence (human's chromosome) It does'nt matter how my function written. The problem is how to split the file correctly. But anyway this is my function:

``````def bases_freq (dna_seq_file):
freq = {} #Creat empty dictionary
nuc = ['a','t','c','g'] #Initializes the list with all the nucleotides
#Count the frequency of the nucleotides in the sequence
for i in range(len(nuc)):
freq[nuc[i]] = (str.count(dna_seq_file,nuc[i]))*1.0/len(dna_seq_file)
freq['gc'] = freq['g'] + freq['c'] #Add "gc" content
return freq
``````

Thank you so much!!! Can anybody explain me the second question?

Please use `ADD COMMENT` or `ADD REPLY` to answer to earlier posts, as such this thread remains logically structured and easy to follow.

2
3.9 years ago by
Steven Lakin1.5k
Fort Collins, CO, USA
Steven Lakin1.5k wrote:

"Random DNA sequence" and "equal base frequency" are two different concepts. If you for sure want equal base frequency but you want them in a randomized order, you should generate a string with 5000 A, 5000 C, 5000 G, and 5000 T and then randomly shuffle it using the random module:

``````import random
dna_list = [x for x in ''.join([ 'ACGT' for i in range(5000)])]
random.shuffle(dna_list)
result = ''.join(dna_list)
``````