Question

Normalization factors from DiffBind for browser shots

1

Entering edit mode

9.9 years ago

mkareta ▴ 10

Hey Everyone,

I've been looking all over the internet and haven't found an answer to this question yet so hopefully someone can help me out. I used the DiffBind package in R to identify regions that have different histone markings from some ChIP-seq data between two experimental conditions. Everything worked great, but for publication purposes I would love to show some browser shots of candidate genes. I can load the raw reads into a browser, but I would prefer to have them scaled to the normalization factors that DiffBind used at each locus to better show difference in binding profiles. Even better would be to convert the normalized counts and plot the fold changes from one condition to the next at some regular interval. Any idea how to do this? Particularly how to get the locus specific normalization factors?

PS - I am a novice on R and if it wasn't for their well-written vignette, I would not have gotten this far, so please explain any answers to me in simple terms - Thanks!

ChIP-Seq DiffBind • 2.8k views

ADD COMMENT • link updated 9.9 years ago by dariober 14k • written 9.9 years ago by mkareta ▴ 10

Ram · Answer 1 · 2014-06-08

In my opinion, for visualization purposes it's accurate enough to convert the raw reads (bam files) to tdf format and load them in IGV with the option "normalize coverage data" enabled, this will rescale the raw counts to 1000000 / total counts.thus correcting for different library sizes. This assuming you are using/willing to use IGV, other browsers surely have similar options.

DiffBind, or rather the underlying edgeR and DESeq packages, don't make use of the read profiles. All they use are the raw counts of reads in given intervals (ChipSeq peaks, genes, whatever), in fact the read profile within an interval doesn't enter in the analysis at all.

If you want to show the log fold change you could prepare a bedgraph file where each line has as coordinates the region tested, or maybe the midpoint of the region tested, whichever looks better, and as value (4th column) the log fold change. Then load this file in IGV with or without the raw reads file as above.