Question: Generate Random Dna Sequence Data With Equal Base Frequencies
1
gravatar for User 4000
7.5 years ago by
User 400050
User 400050 wrote:

Hi all.Does anybody know how to generate random DNA sequences (about 20) with equal base frequencies? (I want to generate this data for a test)

sequence random • 9.0k views
ADD COMMENTlink modified 7.5 years ago by Jeroen Van Goey2.2k • written 7.5 years ago by User 400050
3
gravatar for Jeroen Van Goey
7.5 years ago by
Jeroen Van Goey2.2k
Ghent, Belgium
Jeroen Van Goey2.2k wrote:

A solution using Python:

import random

def random_dna_sequence(length):
    return ''.join(random.choice('ACTG') for _ in range(length))

You want a DNA string with equal base probability. So the probability of each base appearing is 0.25. With the following function you can check how much each DNA string deviates from this predicted probability:

def base_frequency(dna):
    d = {}
    for base in 'ATCG':
        d[base] = dna.count(base)/float(len(dna))
    return d

for _ in range(20):
    dna = random_dna_sequence(100)
    print dna, base_frequency(dna)

which would generate a result like:

AAGTGACGCCCGGTGCGAAAAACACGCGCCTCTCCGTAGTCATTCAGACT {'A': 0.26, 'C': 0.32, 'T': 0.18, 'G': 0.24}
AAGGATCTACTACCTCGTCTATTTGAACTACTGTAGTGCTACTAACTCAT {'A': 0.28, 'C': 0.24, 'T': 0.34, 'G': 0.14}
TCCACTTCTTGGTCCTGAACACCTGCAATCACCTCTTACATCGTGCGACG {'A': 0.2, 'C': 0.36, 'T': 0.28, 'G': 0.16}
AATCTCCGGTGTGTCCGCTACGGAGGTTAGGGCACTCCGTGGGAAAGCTC {'A': 0.18, 'C': 0.26, 'T': 0.22, 'G': 0.34}
GCGTAGTTCGCATTGATTAACATAGTGGCGACCATAGACTTCTATTATCG {'A': 0.26, 'C': 0.2, 'T': 0.32, 'G': 0.22}
AAGTGAACCTGGACTGGGTGGATCGTCTCCCTCGTCCGGTCCTTGGTAGC {'A': 0.14, 'C': 0.28, 'T': 0.26, 'G': 0.32}
ATGACGATGACGATCATCGTCAACGCGCGTCGCGCACACTGCATATCCAA {'A': 0.28, 'C': 0.32, 'T': 0.18, 'G': 0.22}
GTGCATACCGGTGCGCGCGTGCGCTAGGTATTGGAATGCTACGCTTAACC {'A': 0.18, 'C': 0.26, 'T': 0.24, 'G': 0.32}
GCCCGCGTGCCGCCAAGGGATGGGGAGAGTATTTTCGCCCCCTAAGTGCC {'A': 0.16, 'C': 0.32, 'T': 0.18, 'G': 0.34}
TCAAGATTCTCCTAAATATATAATGATCATCCGTTGTCATTCTGCGGACT {'A': 0.28, 'C': 0.22, 'T': 0.36, 'G': 0.14}
TGTTTTAGCCCTGTAGCCGGACTACGAAGTTTTAGGCGCCCAGATTAAGG {'A': 0.22, 'C': 0.22, 'T': 0.28, 'G': 0.28}
AGACGAGCTTTCAAGTTCTTGAATCACTACCTTTGACGTCGAGTGTAAGG {'A': 0.26, 'C': 0.2, 'T': 0.3, 'G': 0.24}
TCGCATTGTAAATAGGAACCTGAAACCTGCCAAGGAGATACAGTCTAAAT {'A': 0.38, 'C': 0.2, 'T': 0.22, 'G': 0.2}
CATCCGTGTGGTAACAGTTAATGCCGGGCTCACCCTCAGGTGTGAAGGAT {'A': 0.22, 'C': 0.24, 'T': 0.24, 'G': 0.3}
ACCAAGACATACCTTAAGGCCCACGCGTACAAGTCACGCTCTCAATACGG {'A': 0.32, 'C': 0.34, 'T': 0.16, 'G': 0.18}
CGTCGTTGGTATTCAGAAAACGCTAGCACATATGGTGCCCAGTCAAAGGA {'A': 0.3, 'C': 0.22, 'T': 0.22, 'G': 0.26}
CGTCATTGCACCAAGTGTGGTACTTTGGGGACGTGAGGTAACAATCCCTG {'A': 0.22, 'C': 0.22, 'T': 0.26, 'G': 0.3}
TGGTCCCTGTTTCTCCATTCCGCGTCCATCGTGCGTTCGTCCTTTAAAGT {'A': 0.1, 'C': 0.32, 'T': 0.38, 'G': 0.2}
AATTCACTCTTTTAACGATGGAAACGGGCGTTTGTAGTGTGCCACTAACC {'A': 0.26, 'C': 0.22, 'T': 0.3, 'G': 0.22}
CCTTGTATACCCCACATGAAGAATGGGCCTGACATCAATAATCTTTAGAT {'A': 0.32, 'C': 0.24, 'T': 0.28, 'G': 0.16}
ADD COMMENTlink written 7.5 years ago by Jeroen Van Goey2.2k
2
gravatar for Stefano Berri
7.5 years ago by
Stefano Berri4.1k
Cambridge, UK
Stefano Berri4.1k wrote:

A simple way using R

# define which bases will make up your sequences
bases <- c(rep('A', 5), rep('C',5), rep('G',5), rep('T',5))
# set how many sequences you want to produce
numOfSeqs <- 10
# initialize empty object
seqs <- rep (NA, numOfSeqs)
# populate the object by shuffling and joining your bases
for (i in 1:numOfSeqs){
    seqs[i] <- paste(sample(bases, length(bases)), collapse = '')
}

Then you can do what you want with object seq.

Clearly, if you need to produce a very large number of sequences, you have to find a way to print them to file or it will fill the memory.

ADD COMMENTlink written 7.5 years ago by Stefano Berri4.1k

Thanks alot.Anyway,is there any software to do this without writing a code for it?

ADD REPLYlink written 7.5 years ago by User 400050
1
gravatar for Ahdf-Lell-Kocks
7.5 years ago by
Ahdf-Lell-Kocks1.6k
Ahdf-Lell-Kocks1.6k wrote:

One option is to use PhyloSim:
http://www.ebi.ac.uk/goldman-srv/phylosim/

ADD COMMENTlink written 7.5 years ago by Ahdf-Lell-Kocks1.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 780 users visited in the last hour