Question

When to merge ChIP-Seq data?

0

Entering edit mode

3 months ago

Thomas • 0

Hello,

I am a bench scientist new to bioinformatics analysis. I generated a large ChIP-seq dataset of a cytokine-inducible transcription factor in cells, and I'm trying to analyze the data. I could use some advice on how to proceed with the analysis. The dataset contains the following: two untreated samples, two treated samples, and an input DNA sample for each. The eight files are about 570 GB of data unzipped.

So far, I've gotten familiar with the Unix environment, mapped the reads with STAR, called peaks with both macs3 and HOMER, and run motif analysis. After visualizing the peaks in IGV, I can see that the peaks in the treated samples make sense, so it's a good dataset.

Now that I'm slightly more comfortable with these tools, I'd like to be able to provide my PI with a more polished report on the data. This brings me to my question: when do I merge the pairs of biological replicates? I've seen a few different opinions on which is best:

merge the bam files after mapping to the genome?
merge the bed files after peak calling?
merge at a later point?

Ultimately, I'd like to have two merged datasets that I can use to run motif analysis and show differentially enriched genes. If you have further advice or resources to recommend, I'm happy to hear it.

Thank you!

HOMER ChIP-seq macs3 • 189 views

ADD COMMENT • link updated 3 months ago by Ram 44k • written 3 months ago by Thomas • 0