Question: Random Genomic Regions Of A Given Length Distribution
4
gravatar for dfernan
7.6 years ago by
dfernan690
United States
dfernan690 wrote:

Hi,

I'd like to know if someone has a script for generating a bed file with random genomic regions over the whole human genome. The regions should also follow a given length distribution as input (i.e., mean length = 3000 and some standard deviation,...). In general the length distribution should also be an empirical vectors with all the desired given lengths.

Thanks!

genome bed python • 2.8k views
ADD COMMENTlink modified 7.6 years ago by Irsan7.1k • written 7.6 years ago by dfernan690
1

Not sure what you want these regions for, but have you considered that some areas of the genome cannot be sequenced or have large sequence biases, such as occurrence of repeat regions, GC content, etc. So consider whether the random sequences are biologically relevent to you. You may want to consider filtering by repeatmasker or mappability tracks from UCSC.

ADD REPLYlink written 7.5 years ago by Ian5.6k

makes sense, thanks for the suggestion

ADD REPLYlink written 7.5 years ago by dfernan690

how do you control for GC content as well? Do you calculate the GC content for all the random regions of a length and see if it matches both the length and the GC content?

ADD REPLYlink written 4.8 years ago by epigene490

I have the same question. How do you control GC content ?

ADD REPLYlink written 3.8 years ago by Ming Tang2.6k
9
gravatar for Irsan
7.6 years ago by
Irsan7.1k
Amsterdam
Irsan7.1k wrote:

Try random-genome-fragments from RSA tools or random.intervals from seqbias.

Seqbias allows you to give a vector of desired lengths. You can create that vector by sampling from a (e.g. normal) distribution like

lengths<-rnorm(mean=100000,n=10,sd=1000)

ADD COMMENTlink modified 7.6 years ago • written 7.6 years ago by Irsan7.1k

thanks a lot. sounds like a great tool.

ADD REPLYlink written 7.6 years ago by dfernan690

BTW, you can also give random-genome-fragments a file with desired lengths, you can make that file by sampling from a distribution in R (like I previously said) and then writing the result to "desired_lengths.txt"

ADD REPLYlink written 7.6 years ago by Irsan7.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1756 users visited in the last hour