Question

How to bin my count data for the entire genome?

0

Entering edit mode

4.9 years ago

fr ▴ 210

I'm trying to represent my ChIP-Seq counts, normalized or not, in specific genomic bins but don't know how to do so.

I have already processed my data, and have used findPeaks followed by pos2bed.pl to produce .bedGraph files that contain this info. However, I'd like to have counts summarized for each 10kb bin throughout the genome (this is OK for my purposes). My .bedGraphs contain some of this information, but not spread in equally defined 10kb bins.

I was looking at Homer's annotatePeaks.pl -hist <bin size>, which seems to have data summarize in specific bins, but these are around a peak which is not really what I want. However, I am particularly interested in having them represented in specific genomic bins throughout the genome (i.e. not only those that would be found in a distance d around a peak). I'm sure there is a tool to summarize this, but I'm just not aware of which one to use.

Could someone advice on how I could bin my data?

ChIP-Seq next-gen sequencing genome homer • 2.9k views

ADD COMMENT • link updated 4.9 years ago by Prakash ★ 2.2k • written 4.9 years ago by fr ▴ 210

score 5 · Accepted Answer · 2019-06-05

5

Entering edit mode

4.9 years ago

Prakash ★ 2.2k

bedtools makewindows you might be looking for. you can divide your genome into bins of 10kb and then calculate coverage using your bam files. or may be you can use tag directory from homer as well.

ADD COMMENT • link 4.9 years ago by Prakash ★ 2.2k

0

Entering edit mode

@Prakash, thanks a lot for your suggestion. Just to make sure, you mean something like this:

bedtools makewindows -g mm10.txt -w 50000 > binned_genome.bed

bedtools coverage -a binned_genome.bed -b myfile.bed -sorted -g mm10.txt

My question is then: how are the summaries done? For instance, is each bin showing the mean of counts in that region? I couldn't find this information.

Thanks a lot

Edit: found this thread with some useful information

ADD REPLY • link 4.9 years ago by fr ▴ 210

2

Entering edit mode

bedtools coverage -a binned_genome.bed -b myfile.bed -sorted -g mm10.txt

This will give mean coverage across your binned genomic regions. you can also use genomeCoverage bed to get reads normalized per million.

genomeCoverageBed -ibam <your aligned bam file> -i <binned_genome.bed> -g mm10.fa -scale RPM

ADD REPLY • link 4.9 years ago by Prakash ★ 2.2k