Hi everyone, first of all Iam a biologist so sorry for the (potentially) very basic question.
We performed ATAC-seq (treatment vs. control in duplicates) and processed the data through the Encode pipeline. The QC (especially the TSS enrichment) shows for both control samples a excellent result whereas both treatment samples are borderline ( but not so bad to trash it). Also the visual inspection of the peaks looks okay (very similar peak pattern between control and treatment). However we noticed that the peaks in the treatment samples are overall smaller compare to the control (bigwig TPM normalized) so we concluded that we have a potential enrichment bias. Since we do not knock down any global factor it is most likely a technical artefact. We continued to identify differential accessible regions using the csaw package ("trended" normalization). The resulting regions are highly enriched for binding motives of biological meaningful transcription factors (identified with HOMER) and also close to relevant genes.
So far so good... however now I have problems to visualise those regions. Since the TPM normalization of the bigwig files does not account for the global enrichment bias (correct ?) the plotted heatmaps of the differential accessible regions do not reflect the result from csaw (control has always higher signal compare to treatment). Please correct me when Iam wrong but in this case we could perform instead of the TPM normalization a quantile normalization right ? Is there a easy way to do this either with bam or bigwig files ? Otherwise I could get a count matrix from deeptools and feed it into the CQN package. However the cqn function requires a covariate and to be honest Iam not sure what this is in my data (sequencing depths ?). After the normalization I could convert the resulting file back into a bigwig file and us it for plotting.
Thanks for any suggestions.
Sorry I have to come back again with two question regarding the bin based normalization (I guess you mean the loess-based normalization right ?). Since I using also this method to get my differential peaks which works quite well I would also like to use this method to scale my bedgraph/bigwig files just to keep everything the same across the analysis (and to avoid any nasty reviewer comment ;) ). However I have two questions:
Before running normOffsets its recommended to filter windows which is just fine in order to identify differential peaks however for a signal track I want to scale all windows. Therefore should I perform normOffsets with all windows (so dont perform the filterWindows function) or should I calculated the scaling factors with the filtered file and set the scaling factor to "1" for all bins which got kicked out in filterWindows (although this seems to be a bit arbitrary) ?
After calculating the scaling factor which is the best way to use it ? Make a bedgraph file using deeptools with same bin sizes and then multiply each row with the scaling factor calculated by normOffsets ?
Thanks again, Flo
Sorry for the super late reply, I missed the comment.
I do not really understand 1). What I do (in the rare case I really use bin-based normalization) is simply to use the full count matrix based on the 10kb windows (generated with
bedtools makewindows
and thenfeatureCounts
and feed this into the above code snipped. I do not use the code provided in the csaw manual but create the countmatrix externally.For 2) you have different options. The simplest is to feed to norm. factors into deeptools
bamCoverage --scaleFactor
as described above. Alternatively you can make a bedGraph and divide the column $4 (so the coverages) by the norm. factor. Here is a Bioc thread where I asked thecsaw
developer about this to be sure I got it right: https://support.bioconductor.org/p/124180/Hello, I have got the scale factor according to your method, but I'm not sure which is right. can you help me to judge which is right? thank you! here is my code
I use the first approach.
you are right, thank you so much!
Hello, Thank you for your reply, it really helped me, I also have a question for the ChIP-seq data, I have two bam file (one is sample, the other is input), how can I normalize the data? how can I get the
scale factor
? May I get thebw file
throughbamCompare
first ? and then get therawcount
file and use your TMM method ? or other correct way?Input and IP samples are fundamentally different towards library composition and violate the assumptions of the normalization. I would therefore not try to process them in the described way. I have no hands-on experience with this as I never use inputs beyond the peak calling step. deeptools implements the SES normalization for your scenario afaik, maybe check their documentation.
OKļ¼thank you very much
Impressive results, @ATpoint! Can we combine this approach with ERCC spike-in? I'd like to normalise RNA-seq bigwigs. Many thanks.
I guess you can: How does edgeR do ERCC spike-in normalization?
This is great! How would I make a standard count matrix? Using featureCounts?
I use featureCounts indeed.
Thank you @ATpoint for the very helpful tutorial. For the last two tracks of the IGV plot, did you plot scaled signal-to-noise ratios using TMM scaling factors or scaled counts normalized by scaling factors? If it's the latter, how do we account for background noise?
Hi, thank you, glad you find it helpful. The last two is the raw read counts divided by the scaling factor that comes out when running the above code on a count matrix based on the peaks. So that is not a ratio, it is normalized counts, it would be "the latter". I think this takes good care of noise because if you have higher noise in sampleA rather than sampleB then at the same total read count the peaks in A would be generally lower, and the TMM on the peak counts then kind of lift up the counts of all peaks to be comparable between samples. There is a great chapter in Aaron Lun's csaw book on normalization in the ChIP-seq (which applies to ATAC-seq as well) context that discusses the various aspects to consider, please see at http://bioconductor.org/books/3.13/csawBook/chap-norm.html