Hello everyone! I’m working on an ATAC-seq project with four samples (no replicates), each sequenced to a different depth (e.g. 200 M, 300 M, 400 M paired-end reads, etc.). After marking duplicates on the BAMs, I’m generating BigWig tracks with deeptools bamCoverage using RPGC normalization (as recommended here: https://groups.google.com/g/deeptools/c/th96gaftAXQ). When I run computeMatrix and plot TSS enrichment (cluster of 5 genes), I still see fluctuations instead of a smooth curve that I haven’t been able to explain.
Could someone advise on:
- Whether RPGC is the best normalization strategy when you have no replicates but varying library sizes.
- How to calculate and apply the correct scale factors (e.g. using
the --scaleFactor option) if RPGC alone isn’t sufficient.
Any tips on achieving truly comparable BigWig tracks (and hence TSS plots) across samples would be hugely appreciated!
Thank you!
For the next experiment, consider spending money on replicates rather than 10-fold excessive depth. We typically do 30mio reads per sample. At
>> 100mio
most will just be duplicates.