Upsampling BAMs, or downsampling by A LOT?

0

Entering edit mode

6.1 years ago

science_lizard ▴ 20

I always downsample my ChIP-seq BAM files to the file with the lowest # of reads before I do any peak calling. My question is, what happens when you want to compare your data to publicly available data that has much, much lower coverage? I usually get about 60 million unique reads, and there's a dataset I'd really like to compare my data to (it's in a different cell line and I want to see if the distribution of peaks is different), but they only have about 17 million reads. I'm hesitant to downsample my own data by that much, but I imagine "upsampling" their data would only lead to a bunch of false positive data... Does anyone know what the convention is for this kind of problem?

Thanks in advance!

ChIP-Seq samtools bam downsampling upsampling • 1.7k views

ADD COMMENT • link 6.1 years ago by science_lizard ▴ 20

1

Entering edit mode

Comparisons over batch effects are problematic for a variety of reasons. What is the exact comparison you're trying to make? Hopefully you're not trying to use some published sample from someone else as a control for a comparison, that's recipe for problems.

ADD REPLY • link 6.1 years ago by Devon Ryan 104k

Login before adding your answer.