Question: Upsampling BAMs, or downsampling by A LOT?
gravatar for science_lizard
4 months ago by
science_lizard0 wrote:

I always downsample my ChIP-seq BAM files to the file with the lowest # of reads before I do any peak calling. My question is, what happens when you want to compare your data to publicly available data that has much, much lower coverage? I usually get about 60 million unique reads, and there's a dataset I'd really like to compare my data to (it's in a different cell line and I want to see if the distribution of peaks is different), but they only have about 17 million reads. I'm hesitant to downsample my own data by that much, but I imagine "upsampling" their data would only lead to a bunch of false positive data... Does anyone know what the convention is for this kind of problem?

Thanks in advance!

ADD COMMENTlink written 4 months ago by science_lizard0

Comparisons over batch effects are problematic for a variety of reasons. What is the exact comparison you're trying to make? Hopefully you're not trying to use some published sample from someone else as a control for a comparison, that's recipe for problems.

ADD REPLYlink written 4 months ago by Devon Ryan81k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 739 users visited in the last hour