Quickly Plotting Distribution Of Points From Bed File?
3
3
Entering edit mode
8.1 years ago

I've got about 10e5-10e6 small regions in a bed file and I want to quickly inspect their genome distribution. I expect them to be evenly distributed in all chromosomes, so just plotting the results in something like a Manhattan plot would be useful. Is there anything like that I can use?

Something like:

plot_genomic_distribution --input myfile.bed --output myfile.png

Even something that just plots on a terminal would be good, like a stem and leaves plot vertically over the chromosomal arms.

bed • 7.3k views
ADD COMMENT
5
Entering edit mode
8.1 years ago
Irsan ★ 7.4k

No need for binning the bed-file to plot a distribution if you use famous ggplot2 R library. Have a look at the ggplot2-solution at Plotting Density Of Snps On Chromosomes for plotting a distribution of features (in the example SNPs) across all chromosomes

enter image description here

ADD COMMENT
3
Entering edit mode
8.1 years ago

One option is to map your regions to disjoint windows over a chromosome, counting the number of regions within each window.

To this end, you can use BEDOPS bedmap and its --count operator.

First, decide on chromosome bounds. For instance, for the hg19 build of the human genome, you could use the extents available here in hg19.extents.bed:

Second, make some 10 kb windows with awk and store them in a file called hg19.disjointWindows.bed (or adjust window size, depending on the granularity you want):

$ awk ' \
    BEGIN {\
         windowSize = 10000; \
    } \
    { \
         extentChromosome = $1; \
         extentStart = $2; \
         extentStop = $3; \
         for (windowStart = extentStart; windowStart < extentStop; windowStart += windowSize) { \
             windowStop = windowStart + windowSize; \
             print extentChromosome"\t"windowStart"\t"windowStop; \
         } \
    }' hg19.extents.bed \
    > hg19.disjointWindows.bed

This is guaranteed to be sorted, so we don't need to use any other tools (e.g., sort-bed) to sort this data. So we can use these windows with BEDOPS tools directly.

Third, we prepare your regions. Let's say your regions are in a BED file called unsortedRegions.bed, but we don't know its sort order. So we first sort this data with BEDOPS sort-bed:

$ sort-bed unsortedRegions.bed > regions.bed

Fourth, we count:

$ bedmap --echo-map --count regions.bed hg19.disjointWindows.bed > answer.bed

The file answer.bed will contain data in the following format:

[ window-1 ] | [ number-of-regions-overlapping-window-1 ]
[ window-2 ] | [ number-of-regions-overlapping-window-2 ]
. . .
[ window-n ] | [ number-of-regions-overlapping-window-n ]

Where each [ window-i ] is the i-th element in hg19.disjointWindows.bed, and [ number-of-regions-... ] is the count of overlapping regions over that window.

If you want to split this file further by chromosome, use BEDOPS bedextract. In a bash shell, this is a very fast one-liner:

$ for chr in `bedextract --list-chr answer.bed`; do bedextract $chr answer.bed > $chr.answer.bed; done

You can then bring answer.bed or its separated chromosome files into R or gnuplot, plotting a log-scaled histogram or visualizing it in any number of other ways.

ADD COMMENT
0
Entering edit mode

Hi Alex,

Thanks for posting this, wondering about the way you generated the extents file, I need to do essentially the same thing for mm10.

Rob.

ADD REPLY
1
Entering edit mode

Could use Kent utilities:

$ fetchChromSizes mm10 | awk '{ print $1"0\t"$2 }' > mm10.extents.bed
ADD REPLY
0
Entering edit mode

Yes that works well, weirdly though I get an empty mm10.disjointWindows.bed file after running the awk script.

ADD REPLY
0
Entering edit mode

Yeah, not sure why that is. Maybe do:

$ fetchChromSizes mm10 | head

and see what you get.

ADD REPLY
0
Entering edit mode

Ah, sorry, I misread. I'm not sure why you get an empty result. I'd break things down and use head to see what comes out at various stages.

ADD REPLY
1
Entering edit mode

Another option is to use bedops --chop to make the 10k windows, instead of awk:

$ bedops --chop 10000 mm10.extents.bed > mm10.disjointWindows.bed

This option was added after I wrote the initial answer.

ADD REPLY
1
Entering edit mode
8.1 years ago

You could load the bed file into a GRanges object and then just use plotGrandLinear from ggbio to get a manhattan plot.

ADD COMMENT

Login before adding your answer.

Traffic: 2057 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6