Question: Generate Random Genomic Regions
1
gravatar for ChIP
5.7 years ago by
ChIP490
Netherlands
ChIP490 wrote:

Hi!

I have a peakfile (file containing genomic regions in BED format) containing 1000 regions from hg18.

What I would like, is to generate a random set of 1000 peaks from hg 18 of nearly same size and type (with respect to their position in genome promoter/exon/intron/intergenic).

I am almost sure all the guys working in motif discovery have encountered such a problem.

Kindly help

Thank you

chip-seq • 4.9k views
ADD COMMENTlink modified 5 weeks ago by Andrewoods90 • written 5.7 years ago by ChIP490

Duplicate Picking random genomic positions

ADD REPLYlink written 5.6 years ago by PoGibas4.7k

True, but this question has a novel slant; ChIP requires that the distances between genomic regions and nearby genes be maintained.

ADD REPLYlink written 5.6 years ago by Ian5.4k
4
gravatar for KCC
5.7 years ago by
KCC3.9k
Cambridge, MA
KCC3.9k wrote:

There is a function in the bedtools package called shuffleBED.

It generates random regions with the same size distribution as your original list of peaks.

You can specify that randomly generated set of features come from the same chromosome as the originals, and you can specify regions that should not contain peaks such as intergenic regions.

.

ADD COMMENTlink written 5.7 years ago by KCC3.9k

@ George: how about a small example of the command, the way I am using shuffleBed is: shuffleBed -i Peakfile -g genome_table.human.hg18.txt> test ...... Now what more to add ???? I get same number of peaks but the genomic distribution is entirely different. For instance Peakfile has 700 peaks in promoter while the test file generated has only 200 peaks in promoter. Thank you

ADD REPLYlink modified 5.7 years ago • written 5.7 years ago by ChIP490

The randomly generated peaks are going to match the distribution of your non-excluded region. If you want to match the number in promoters etc, I think you would need to break your peaks into ones in each category (promoter, exon , intron etc) and create specific exclusion files for each category. I don't know if there is a less ugly way to do this, but this is what comes to mind.

ADD REPLYlink modified 5.6 years ago • written 5.6 years ago by KCC3.9k
1
gravatar for Ian
5.6 years ago by
Ian5.4k
University of Manchester, UK
Ian5.4k wrote:

You might be interested in a script generate_background_sequences.py) that comes with GimmeMotifs, which can extract random sequences maintaining the same same distance to a random gene "matched genomic background". If I remember correctly you would need to compile GimmeMotifs first to get access to the required python libs.

ADD COMMENTlink written 5.6 years ago by Ian5.4k
0
gravatar for Andrewoods
5 weeks ago by
Andrewoods90
Andrewoods90 wrote:

You can try the seqbias R package. There is a function random.intervals.
link: https://bioconductor.org/packages/release/bioc/html/seqbias.html

ADD COMMENTlink modified 5 weeks ago • written 5 weeks ago by Andrewoods90
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2500 users visited in the last hour