Picking Random Genomic Positions
3
9
Entering edit mode
11.8 years ago
PoGibas 5.1k

I do have a set of TF binding coordinates and want to see if there is any significant overlap with an open chromatin annotation.

Example of TF coord:
chr1 19280 19298
chr1 245920 245938
chr2 97290 97308
chr9 752910 752938
...

Example of open chrom. coord. (UCSC track):
chr2 33031543 33032779
chr3 2304169 2304825
chr5 330899 330940
...

I have checked the intersection with the Bedtools (open chrom. coord vs TF coord. -/+ 100bp) and now I want to check the intersection between random genomic coordinates and open chrom.

The idea is to:

  1. Pick random genomic position (from the same chromosome as TF coordinate);
  2. -/+9bp (binding site size);
  3. -/+ 100bp;
  4. Run this simulation for 1000 times (TF x 1000);
  5. Bedtools;

Any ideas how can I do this simulation to pick random genomic positions from the same chromosome? I know a little bit of bash and Perl, but won't be able to write the script by myself.
Is it possible to measure the length of every chromosome;
Pick TF chromosome and from it's length get a random number which would represent a genomic position?

Can someone help me with the simulation and the pipeline.

bash perl bedtools • 11k views
ADD COMMENT
17
Entering edit mode
11.8 years ago

My sense is that you are trying to do a Monte Carlo simulation and as such, you actually want to use the shuffle command. Shuffling will choose a new location for each of your original intervals while preserving its size. The loop below will print out how many intersections were observed for each of 1000 shuffles. A P-value can be derived by counting the number of times that the number of shuffled intersections exceeds your observed intersections. If 0, then you can say that P is less than 0.001.

#!/bin/bash
for i in {1..1000}
do
        OBS=`bedtools shuffle -i openchrom.bed -g hg19.chrom.sizes | \
            bedtools intersect -a tf.bed -b - | wc -l`
        echo $OBS 
done

If you really want a new set of randomly generated intervals, there is a random function in bedtools

bedtools random -n 1000 -l 118 -g hg19.chrom.sizes

If you are trying to measure the evidence for statistically significant relationship, the pybedtools package has nice tools for this. See the docs here as well as the intersection_matrix.py tool.

Lastly, there is a great new package called GenometriCorr in R, as well as an associated paper describing some interesting new metrics for measuring relationships between sets of genomic intervals.

ADD COMMENT
1
Entering edit mode

Some time we need keep similar GC contents or some other metric, any idea to update the script?

ADD REPLY
0
Entering edit mode

Very interesting paper - thanks for the link!

ADD REPLY
0
Entering edit mode

I love the new bedtools works with commands like samtools - I did not realize it up to know, the manuals still list the old scripts - I have been meaning to suggest it but did not want to look like I don't appreciate the existing tools already. Great link to the paper as well!

ADD REPLY
0
Entering edit mode

@Istvan - yeah sorry, the docs are out of date. in the process of hiring someone to take over all of that and we should have a new docs website up by Fall.

ADD REPLY
0
Entering edit mode

Where can I please download hg19.chrom.sizes?

ADD REPLY
1
Entering edit mode

If you have mysql installed, you can do: mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -e \ "select chrom, size from hg19.chromInfo" > hg19.chrom.sizes

Otherwise, you can use the UCSC Table Browser. http://genome.ucsc.edu/cgi-bin/hgTables?command=start Group = "All tables" Table=chromInfo

ADD REPLY
4
Entering edit mode
11.8 years ago

As you are already using bedtools, you could perhaps work with the function "shufflebed"? It is well documented on the developer's website: http://code.google.com/p/bedtools/wiki/Usage#shuffleBed

ADD COMMENT
1
Entering edit mode

I have just tryed it as well and it works just perfect.

ADD REPLY
3
Entering edit mode
11.8 years ago

A simple one liner in perl would look like this:

perl -e '$x=int(rand(1000)); printf( "chr1\t%s\t%s", $x, $x + 10)'

You just need to add this into a loop and tweak it to your needs.

ADD COMMENT

Login before adding your answer.

Traffic: 2076 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6