Random Genomic Regions Of A Given Length Distribution
1
4
Entering edit mode
11.5 years ago
dfernan ▴ 760

Hi,

I'd like to know if someone has a script for generating a bed file with random genomic regions over the whole human genome. The regions should also follow a given length distribution as input (i.e., mean length = 3000 and some standard deviation,...). In general the length distribution should also be an empirical vectors with all the desired given lengths.

Thanks!

bed genome python • 4.1k views
ADD COMMENT
1
Entering edit mode

Not sure what you want these regions for, but have you considered that some areas of the genome cannot be sequenced or have large sequence biases, such as occurrence of repeat regions, GC content, etc. So consider whether the random sequences are biologically relevent to you. You may want to consider filtering by repeatmasker or mappability tracks from UCSC.

ADD REPLY
0
Entering edit mode

makes sense, thanks for the suggestion

ADD REPLY
0
Entering edit mode

how do you control for GC content as well? Do you calculate the GC content for all the random regions of a length and see if it matches both the length and the GC content?

ADD REPLY
0
Entering edit mode

I have the same question. How do you control GC content ?

ADD REPLY
9
Entering edit mode
11.5 years ago
Irsan ★ 7.8k

Try random-genome-fragments from RSA tools or random.intervals from seqbias.

Seqbias allows you to give a vector of desired lengths. You can create that vector by sampling from a (e.g. normal) distribution like

lengths<-rnorm(mean=100000,n=10,sd=1000)

ADD COMMENT
0
Entering edit mode

thanks a lot. sounds like a great tool.

ADD REPLY
0
Entering edit mode

BTW, you can also give random-genome-fragments a file with desired lengths, you can make that file by sampling from a distribution in R (like I previously said) and then writing the result to "desired_lengths.txt"

ADD REPLY

Login before adding your answer.

Traffic: 2365 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6