**50**wrote:

Hi all.Does anybody know how to generate random DNA sequences (about 20) with equal base frequencies? (I want to generate this data for a test)

**2.2k**• written 7.5 years ago by User 4000 •

**50**

Question: Generate Random Dna Sequence Data With Equal Base Frequencies

1

User 4000 • **50** wrote:

Hi all.Does anybody know how to generate random DNA sequences (about 20) with equal base frequencies? (I want to generate this data for a test)

ADD COMMENT
• link
•
modified 7.5 years ago
by
Jeroen Van Goey ♦ **2.2k**
•
written
7.5 years ago by
User 4000 • **50**

3

Jeroen Van Goey ♦ **2.2k** wrote:

A solution using Python:

```
import random
def random_dna_sequence(length):
return ''.join(random.choice('ACTG') for _ in range(length))
```

You want a DNA string with equal base probability. So the probability of each base appearing is `0.25`

. With the following function you can check how much each DNA string deviates from this predicted probability:

```
def base_frequency(dna):
d = {}
for base in 'ATCG':
d[base] = dna.count(base)/float(len(dna))
return d
for _ in range(20):
dna = random_dna_sequence(100)
print dna, base_frequency(dna)
```

which would generate a result like:

```
AAGTGACGCCCGGTGCGAAAAACACGCGCCTCTCCGTAGTCATTCAGACT {'A': 0.26, 'C': 0.32, 'T': 0.18, 'G': 0.24}
AAGGATCTACTACCTCGTCTATTTGAACTACTGTAGTGCTACTAACTCAT {'A': 0.28, 'C': 0.24, 'T': 0.34, 'G': 0.14}
TCCACTTCTTGGTCCTGAACACCTGCAATCACCTCTTACATCGTGCGACG {'A': 0.2, 'C': 0.36, 'T': 0.28, 'G': 0.16}
AATCTCCGGTGTGTCCGCTACGGAGGTTAGGGCACTCCGTGGGAAAGCTC {'A': 0.18, 'C': 0.26, 'T': 0.22, 'G': 0.34}
GCGTAGTTCGCATTGATTAACATAGTGGCGACCATAGACTTCTATTATCG {'A': 0.26, 'C': 0.2, 'T': 0.32, 'G': 0.22}
AAGTGAACCTGGACTGGGTGGATCGTCTCCCTCGTCCGGTCCTTGGTAGC {'A': 0.14, 'C': 0.28, 'T': 0.26, 'G': 0.32}
ATGACGATGACGATCATCGTCAACGCGCGTCGCGCACACTGCATATCCAA {'A': 0.28, 'C': 0.32, 'T': 0.18, 'G': 0.22}
GTGCATACCGGTGCGCGCGTGCGCTAGGTATTGGAATGCTACGCTTAACC {'A': 0.18, 'C': 0.26, 'T': 0.24, 'G': 0.32}
GCCCGCGTGCCGCCAAGGGATGGGGAGAGTATTTTCGCCCCCTAAGTGCC {'A': 0.16, 'C': 0.32, 'T': 0.18, 'G': 0.34}
TCAAGATTCTCCTAAATATATAATGATCATCCGTTGTCATTCTGCGGACT {'A': 0.28, 'C': 0.22, 'T': 0.36, 'G': 0.14}
TGTTTTAGCCCTGTAGCCGGACTACGAAGTTTTAGGCGCCCAGATTAAGG {'A': 0.22, 'C': 0.22, 'T': 0.28, 'G': 0.28}
AGACGAGCTTTCAAGTTCTTGAATCACTACCTTTGACGTCGAGTGTAAGG {'A': 0.26, 'C': 0.2, 'T': 0.3, 'G': 0.24}
TCGCATTGTAAATAGGAACCTGAAACCTGCCAAGGAGATACAGTCTAAAT {'A': 0.38, 'C': 0.2, 'T': 0.22, 'G': 0.2}
CATCCGTGTGGTAACAGTTAATGCCGGGCTCACCCTCAGGTGTGAAGGAT {'A': 0.22, 'C': 0.24, 'T': 0.24, 'G': 0.3}
ACCAAGACATACCTTAAGGCCCACGCGTACAAGTCACGCTCTCAATACGG {'A': 0.32, 'C': 0.34, 'T': 0.16, 'G': 0.18}
CGTCGTTGGTATTCAGAAAACGCTAGCACATATGGTGCCCAGTCAAAGGA {'A': 0.3, 'C': 0.22, 'T': 0.22, 'G': 0.26}
CGTCATTGCACCAAGTGTGGTACTTTGGGGACGTGAGGTAACAATCCCTG {'A': 0.22, 'C': 0.22, 'T': 0.26, 'G': 0.3}
TGGTCCCTGTTTCTCCATTCCGCGTCCATCGTGCGTTCGTCCTTTAAAGT {'A': 0.1, 'C': 0.32, 'T': 0.38, 'G': 0.2}
AATTCACTCTTTTAACGATGGAAACGGGCGTTTGTAGTGTGCCACTAACC {'A': 0.26, 'C': 0.22, 'T': 0.3, 'G': 0.22}
CCTTGTATACCCCACATGAAGAATGGGCCTGACATCAATAATCTTTAGAT {'A': 0.32, 'C': 0.24, 'T': 0.28, 'G': 0.16}
```

2

Stefano Berri ♦ **4.1k** wrote:

A simple way using R

```
# define which bases will make up your sequences
bases <- c(rep('A', 5), rep('C',5), rep('G',5), rep('T',5))
# set how many sequences you want to produce
numOfSeqs <- 10
# initialize empty object
seqs <- rep (NA, numOfSeqs)
# populate the object by shuffling and joining your bases
for (i in 1:numOfSeqs){
seqs[i] <- paste(sample(bases, length(bases)), collapse = '')
}
```

Then you can do what you want with object seq.

Clearly, if you need to produce a very large number of sequences, you have to find a way to print them to file or it will fill the memory.

1

Ahdf-Lell-Kocks • **1.6k** wrote:

One option is to use PhyloSim:

http://www.ebi.ac.uk/goldman-srv/phylosim/

Please log in to add an answer.

Use of this site constitutes acceptance of our User
Agreement
and Privacy
Policy.

Powered by Biostar
version 2.3.0

Traffic: 780 users visited in the last hour