Question: How to bin my count data for the entire genome?
0
gravatar for fr
17 months ago by
fr150
fr150 wrote:

I'm trying to represent my ChIP-Seq counts, normalized or not, in specific genomic bins but don't know how to do so.

I have already processed my data, and have used findPeaks followed by pos2bed.pl to produce .bedGraph files that contain this info. However, I'd like to have counts summarized for each 10kb bin throughout the genome (this is OK for my purposes). My .bedGraphs contain some of this information, but not spread in equally defined 10kb bins.

I was looking at Homer's annotatePeaks.pl -hist <bin size>, which seems to have data summarize in specific bins, but these are around a peak which is not really what I want. However, I am particularly interested in having them represented in specific genomic bins throughout the genome (i.e. not only those that would be found in a distance d around a peak). I'm sure there is a tool to summarize this, but I'm just not aware of which one to use.

Could someone advice on how I could bin my data?

ADD COMMENTlink modified 17 months ago by Prakash2.0k • written 17 months ago by fr150
5
gravatar for Prakash
17 months ago by
Prakash2.0k
India
Prakash2.0k wrote:

bedtools makewindows you might be looking for. you can divide your genome into bins of 10kb and then calculate coverage using your bam files. or may be you can use tag directory from homer as well.

ADD COMMENTlink modified 17 months ago • written 17 months ago by Prakash2.0k

@Prakash, thanks a lot for your suggestion. Just to make sure, you mean something like this:

bedtools makewindows -g mm10.txt -w 50000 > binned_genome.bed

bedtools coverage -a binned_genome.bed -b myfile.bed -sorted -g mm10.txt

My question is then: how are the summaries done? For instance, is each bin showing the mean of counts in that region? I couldn't find this information.

Thanks a lot

Edit: found this thread with some useful information

ADD REPLYlink modified 17 months ago • written 17 months ago by fr150
2

bedtools coverage -a binned_genome.bed -b myfile.bed -sorted -g mm10.txt

This will give mean coverage across your binned genomic regions. you can also use genomeCoverage bed to get reads normalized per million.

genomeCoverageBed -ibam <your aligned bam file> -i <binned_genome.bed> -g mm10.fa -scale RPM
ADD REPLYlink modified 17 months ago • written 17 months ago by Prakash2.0k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1637 users visited in the last hour