Question

Binning the genomic regions and counting reads

0

Entering edit mode

7.6 years ago

ChIP ▴ 600

Hi,

I have list of genomic regions:

chr14   83053041    83062424
chr4    9160067 9223997
chr2    144458153   144467912
chr10   86819472    86833402
chr12   39614733    39641285
chr4    68608137    68629925
chr17   36334147    36408831
chr12   19306607    19314016
chr4    97531083    97537086
chr2    88351361    88366115

I would like to extend them by 2 kb and then divide them in a window of 500 bps and count tags.

Extending the regions is quite straight forward:

awk '{ print $1 "\t" $2-2000 "\t" $3+2000}'<test.bed>ext.test.bed

I am bit lost as how to divide it in equal number of windows of 500 bps and count tag in them.

If it was just counting reads it is simple

coverageBed -abam test.bam -b ext.test.bed -counts > Reads.bed

Kindly guide or share your one or two liners.

Thank you

Bed tool bash • 1.8k views

ADD COMMENT • link updated 7.6 years ago by James Ashmore ★ 3.5k • written 7.6 years ago by ChIP ▴ 600

score 4 · Accepted Answer · 2017-03-24

4

Entering edit mode

7.6 years ago

James Ashmore ★ 3.5k

You should be able to do all of this using bedtools. Here's an example for you to follow:

bedtools slop -i regions.bed -g chrom.sizes -b 2000 > regions_2kb.bed
bedtools makewindows -b regions_2kb.bed -w 500 > windows_500bp.bed
bedtools coverage -counts -a windows_500bp.bed -b alignments.bam > coverage.txt

The chrom.sizes file simply lists the chromosome name and chromosome length for your given genome:

chr1    195471971
chr2    182113224
chr3    160039680
chr4    156508116
chr5    151834684
chr6    149736546
chr7    145441459
chr8    129401213
chr9    124595110
chr10   130694993

You can use the fetchChromSizes script from UCSC to download this file (http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/fetchChromSizes)

ADD COMMENT • link 7.6 years ago by James Ashmore ★ 3.5k

0

Entering edit mode

Hi,

Thank you for your answer, I have an additional question. How can I plot these numbers as an heatmap or average coverage profile.

Thank you

ADD REPLY • link 7.6 years ago by ChIP ▴ 600

0

Entering edit mode

In that case I would suggest you use deepTools. It's a great toolkit for counting and plotting read distribution/enrichment. For your case, have a look at the following example in their documentation: http://deeptools.readthedocs.io/en/latest/content/example_gallery.html#dnase-accessibility-at-enhancers-in-murine-es-cells

ADD REPLY • link 7.6 years ago by James Ashmore ★ 3.5k