Question: How to bin my count data for the entire genome?
0
gravatar for fr
13 days ago by
fr100
fr100 wrote:

I'm trying to represent my ChIP-Seq counts, normalized or not, in specific genomic bins but don't know how to do so.

I have already processed my data, and have used findPeaks followed by pos2bed.pl to produce .bedGraph files that contain this info. However, I'd like to have counts summarized for each 10kb bin throughout the genome (this is OK for my purposes). My .bedGraphs contain some of this information, but not spread in equally defined 10kb bins.

I was looking at Homer's annotatePeaks.pl -hist <bin size>, which seems to have data summarize in specific bins, but these are around a peak which is not really what I want. However, I am particularly interested in having them represented in specific genomic bins throughout the genome (i.e. not only those that would be found in a distance d around a peak). I'm sure there is a tool to summarize this, but I'm just not aware of which one to use.

Could someone advice on how I could bin my data?

ADD COMMENTlink modified 12 days ago by Prakash1.2k • written 13 days ago by fr100
4
gravatar for Prakash
12 days ago by
Prakash1.2k
India
Prakash1.2k wrote:

bedtools makewindows you might be looking for. you can divide your genome into bins of 10kb and then calculate coverage using your bam files. or may be you can use tag directory from homer as well.

ADD COMMENTlink modified 12 days ago • written 12 days ago by Prakash1.2k

@Prakash, thanks a lot for your suggestion. Just to make sure, you mean something like this:

bedtools makewindows -g mm10.txt -w 50000 > binned_genome.bed

bedtools coverage -a binned_genome.bed -b myfile.bed -sorted -g mm10.txt

My question is then: how are the summaries done? For instance, is each bin showing the mean of counts in that region? I couldn't find this information.

Thanks a lot

Edit: found this thread with some useful information

ADD REPLYlink modified 10 days ago • written 10 days ago by fr100
2

bedtools coverage -a binned_genome.bed -b myfile.bed -sorted -g mm10.txt

This will give mean coverage across your binned genomic regions. you can also use genomeCoverage bed to get reads normalized per million.

genomeCoverageBed -ibam <your aligned bam file> -i <binned_genome.bed> -g mm10.fa -scale RPM
ADD REPLYlink modified 10 days ago • written 10 days ago by Prakash1.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 968 users visited in the last hour