Question

Generate Random Genomic Regions

2

Entering edit mode

10.7 years ago

ChIP ▴ 600

Hi!

I have a peakfile (file containing genomic regions in BED format) containing 1000 regions from hg18.

What I would like, is to generate a random set of 1000 peaks from hg 18 of nearly same size and type (with respect to their position in genome promoter/exon/intron/intergenic).

I am almost sure all the guys working in motif discovery have encountered such a problem.

Kindly help

Thank you

chip-seq • 8.7k views

ADD COMMENT • link updated 5.2 years ago by Andrewoods ▴ 110 • written 10.7 years ago by ChIP ▴ 600

0

Entering edit mode

Duplicate Picking random genomic positions

ADD REPLY • link 10.7 years ago by PoGibas 5.1k

0

Entering edit mode

True, but this question has a novel slant; ChIP requires that the distances between genomic regions and nearby genes be maintained.

ADD REPLY • link 10.7 years ago by Ian 6.0k

score 6 · Answer 1 · 2013-08-01

6

Entering edit mode

10.7 years ago

KCC ★ 4.1k

There is a function in the bedtools package called shuffleBED.

It generates random regions with the same size distribution as your original list of peaks.

You can specify that randomly generated set of features come from the same chromosome as the originals, and you can specify regions that should not contain peaks such as intergenic regions.

.

ADD COMMENT • link 10.7 years ago by KCC ★ 4.1k

0

Entering edit mode

@ George: how about a small example of the command, the way I am using shuffleBed is: shuffleBed -i Peakfile -g genome_table.human.hg18.txt> test ...... Now what more to add ???? I get same number of peaks but the genomic distribution is entirely different. For instance Peakfile has 700 peaks in promoter while the test file generated has only 200 peaks in promoter. Thank you

ADD REPLY • link 10.7 years ago by ChIP ▴ 600

0

Entering edit mode

The randomly generated peaks are going to match the distribution of your non-excluded region. If you want to match the number in promoters etc, I think you would need to break your peaks into ones in each category (promoter, exon , intron etc) and create specific exclusion files for each category. I don't know if there is a less ugly way to do this, but this is what comes to mind.

ADD REPLY • link 10.7 years ago by KCC ★ 4.1k

score 1 · Answer 2 · 2013-08-15

You might be interested in a script generate_background_sequences.py) that comes with GimmeMotifs, which can extract random sequences maintaining the same same distance to a random gene "matched genomic background". If I remember correctly you would need to compile GimmeMotifs first to get access to the required python libs.

score 0 · Answer 3 · 2019-02-15

0

Entering edit mode

5.2 years ago

Andrewoods ▴ 110

You can try the seqbias R package. There is a function random.intervals.
link: https://bioconductor.org/packages/release/bioc/html/seqbias.html

ADD COMMENT • link 5.2 years ago by Andrewoods ▴ 110