Question

Random Genomic Regions Of A Given Length Distribution

4

Entering edit mode

11.5 years ago

dfernan ▴ 760

Hi,

I'd like to know if someone has a script for generating a bed file with random genomic regions over the whole human genome. The regions should also follow a given length distribution as input (i.e., mean length = 3000 and some standard deviation,...). In general the length distribution should also be an empirical vectors with all the desired given lengths.

Thanks!

bed genome python • 4.1k views

ADD COMMENT • link updated 11.5 years ago by Irsan ★ 7.8k • written 11.5 years ago by dfernan ▴ 760

1

Entering edit mode

Not sure what you want these regions for, but have you considered that some areas of the genome cannot be sequenced or have large sequence biases, such as occurrence of repeat regions, GC content, etc. So consider whether the random sequences are biologically relevent to you. You may want to consider filtering by repeatmasker or mappability tracks from UCSC.

ADD REPLY • link 11.4 years ago by Ian 6.0k

0

Entering edit mode

makes sense, thanks for the suggestion

ADD REPLY • link 11.4 years ago by dfernan ▴ 760

0

Entering edit mode

how do you control for GC content as well? Do you calculate the GC content for all the random regions of a length and see if it matches both the length and the GC content?

ADD REPLY • link 8.7 years ago by epigene ▴ 590

0

Entering edit mode

I have the same question. How do you control GC content ?

ADD REPLY • link 7.7 years ago by Ming Tommy Tang ★ 3.9k

score 9 · Answer 1 · 2012-10-18

9

Entering edit mode

11.5 years ago

Irsan ★ 7.8k

Try random-genome-fragments from RSA tools or random.intervals from seqbias.

Seqbias allows you to give a vector of desired lengths. You can create that vector by sampling from a (e.g. normal) distribution like

lengths<-rnorm(mean=100000,n=10,sd=1000)

ADD COMMENT • link 11.5 years ago by Irsan ★ 7.8k

0

Entering edit mode

thanks a lot. sounds like a great tool.

ADD REPLY • link 11.5 years ago by dfernan ▴ 760

0

Entering edit mode

BTW, you can also give random-genome-fragments a file with desired lengths, you can make that file by sampling from a distribution in R (like I previously said) and then writing the result to "desired_lengths.txt"

ADD REPLY • link 11.5 years ago by Irsan ★ 7.8k