Question: Upsampling BAMs, or downsampling by A LOT?
4 months ago by
science_lizard0 wrote:

I always downsample my ChIP-seq BAM files to the file with the lowest # of reads before I do any peak calling. My question is, what happens when you want to compare your data to publicly available data that has much, much lower coverage? I usually get about 60 million unique reads, and there's a dataset I'd really like to compare my data to (it's in a different cell line and I want to see if the distribution of peaks is different), but they only have about 17 million reads. I'm hesitant to downsample my own data by that much, but I imagine "upsampling" their data would only lead to a bunch of false positive data... Does anyone know what the convention is for this kind of problem?

Thanks in advance!

ADD COMMENTlink written 4 months ago by science_lizard0

Comparisons over batch effects are problematic for a variety of reasons. What is the exact comparison you're trying to make? Hopefully you're not trying to use some published sample from someone else as a control for a comparison, that's recipe for problems.

ADD REPLYlink written 4 months ago by Devon Ryan81k
