My first question here.
As mentioned Corces here (Science. 2018 Oct 26;362(6413)1. Differences in quality of ATACseq experiments result in varying percentage of reads in peaks. When comparing samples using only depth normalization (i.e. CPM) the "background" reads would be considered equal to reads falling within peaks. Therefore, samples with high background are artificially depressed.
I was wondering if we could use deepTools bamCoverage to normalize by reads in peaks and depth at the same time. Could we achieve that by blacklisting the background regions adjusting the genome size to the total length on the peaks analysed and multiplying by a scale factor? Something like this:
bamCoverage -p 4 --bam input.bam -o output.bw --binSize 50 --scaleFactor 30 --blackListFileName Background.bed --normalizeUsing CPM --ignoreDuplicates --minMappingQuality 30 --effectiveGenomeSize 108645043 Genome_size_of_the analysed_peaks --ignoreForNormalization chrX chrY chrM --extendReadsde
Moreover, the --scaleFactor
is applied before or after the scale factor from --normalizeUsing CPM
? Are these two different scale factors?
Thanks
Perfect!
I used TMM normalization with edgeR/Limma for my analysis. What I was searching for is a way to visualize in a data track what was being compared. I will try some of the code examples that you tagged. As for the bin size, I agree with you. I was just concern with the size of the output files.
Thanks again
In that case you could directly use the norm. factors from the DGElist object. Be sure to use the reciprocal value of it when feeding into the --scaleFactor option as suggested in the linked answer.