Question: Picking Random Genomic Positions
gravatar for PoGibas
4.9 years ago by
PoGibas4.5k wrote:

I do have a set of TF binding coordinates and want to see if there is any significant overlap with an open chromatin annotation.

Example of TF coord:
chr1 19280 19298
chr1 245920 245938
chr2 97290 97308
chr9 752910 752938

Example of open chrom. coord. (UCSC track):
chr2 33031543 33032779
chr3 2304169 2304825
chr5 330899 330940

I have checked the intersection with the Bedtools (open chrom. coord vs TF coord. -/+ 100bp) and now I want to check the intersection between random genomic coordinates and open chrom.

The idea is to:

  1. Pick random genomic position (from the same chromosome as TF coordinate);
  2. -/+9bp (binding site size);
  3. -/+ 100bp;
  4. Run this simulation for 1000 times (TF x 1000);
  5. Bedtools;

Any ideas how can I do this simulation to pick random genomic positions from the same chromosome? I know a little bit of bash and Perl, but won't be able to write the script by myself.
Is it possible to measure the length of every chromosome;
Pick TF chromosome and from it's length get a random number which would represent a genomic position?

Can someone help me with the simulation and the pipeline.

bash bedtools perl • 5.5k views
ADD COMMENTlink modified 3.1 years ago • written 4.9 years ago by PoGibas4.5k
gravatar for Aaronquinlan
4.9 years ago by
United States
Aaronquinlan9.7k wrote:

My sense is that you are trying to do a Monte Carlo simulation and as such, you actually want to use the shuffle command. Shuffling will choose a new location for each of your original intervals while preserving its size. The loop below will print out how many intersections were observed for each of 1000 shuffles. A P-value can be derived by counting the number of times that the number of shuffled intersections exceeds your observed intersections. If 0, then you can say that P is less than 0.001.

for i in {1..1000}
        OBS=`bedtools shuffle -i openchrom.bed -g hg19.chrom.sizes | \
            bedtools intersect -a tf.bed -b - | wc -l`
        echo $OBS 

If you really want a new set of randomly generated intervals, there is a random function in bedtools

bedtools random -n 1000 -l 118 -g hg19.chrom.sizes

If you are trying to measure the evidence for statistically significant relationship, the pybedtools package has nice tools for this. See the docs here as well as the tool.

Lastly, there is a great new package called GenometriCorr in R, as well as an associated paper describing some interesting new metrics for measuring relationships between sets of genomic intervals.

ADD COMMENTlink modified 4.9 years ago • written 4.9 years ago by Aaronquinlan9.7k

Very interesting paper - thanks for the link!

ADD REPLYlink written 4.9 years ago by Sebastian Kurscheid300

I love the new bedtools works with commands like samtools - I did not realize it up to know, the manuals still list the old scripts - I have been meaning to suggest it but did not want to look like I don't appreciate the existing tools already. Great link to the paper as well!

ADD REPLYlink written 4.9 years ago by Istvan Albert ♦♦ 71k

@Istvan - yeah sorry, the docs are out of date. in the process of hiring someone to take over all of that and we should have a new docs website up by Fall.

ADD REPLYlink written 4.9 years ago by Aaronquinlan9.7k

Where can I please download hg19.chrom.sizes?

ADD REPLYlink written 4.9 years ago by Biomonika (Noolean)2.9k

If you have mysql installed, you can do: mysql --user=genome -A -e \ "select chrom, size from hg19.chromInfo" > hg19.chrom.sizes

Otherwise, you can use the UCSC Table Browser. Group = "All tables" Table=chromInfo

ADD REPLYlink written 4.9 years ago by Aaronquinlan9.7k

Some time we need keep similar GC contents or some other metric, any idea to update the script?

ADD REPLYlink modified 6 weeks ago • written 6 weeks ago by Shicheng Guo4.3k
gravatar for Sebastian Kurscheid
4.9 years ago by
Australia, ACT, Canberra, ANU
Sebastian Kurscheid300 wrote:

As you are already using bedtools, you could perhaps work with the function "shufflebed"? It is well documented on the developer's website:

ADD COMMENTlink written 4.9 years ago by Sebastian Kurscheid300

I have just tryed it as well and it works just perfect.

ADD REPLYlink written 4.9 years ago by Biomonika (Noolean)2.9k
gravatar for Istvan Albert
4.9 years ago by
Istvan Albert ♦♦ 71k
University Park, USA
Istvan Albert ♦♦ 71k wrote:

A simple one liner in perl would look like this:

perl -e '$x=int(rand(1000)); printf( "chr1\t%s\t%s", $x, $x + 10)'

You just need to add this into a loop and tweak it to your needs.

ADD COMMENTlink written 4.9 years ago by Istvan Albert ♦♦ 71k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 429 users visited in the last hour