Question: What is the best pipeline step to merge replicates?
1
gravatar for phosphodiester_bond
2.6 years ago by
phosphodiester_bond40 wrote:

Based on your experience, at which point would you recommend merging the reads from multiple ATAC-seq replicates (biological replicates, for the most part)? And most importantly, why?

I saw at Encode they first process the samples independently up to the BAM files, and merge all BAM files using samtools. What would you lose/gain by merging reads at the fastq level? Or would you rather ignore all of this and merge the results at the very end of the pipeline? (say, after you've called peaks on each of these samples independently) My intuition suggests to merge these files right after trimming reads, but I'm curious about the reasoning for your choice of merge step.

Thanks for any feedback!

sequencing pipeline • 1.8k views
ADD COMMENTlink modified 2.6 years ago by Friederike6.7k • written 2.6 years ago by phosphodiester_bond40

It is not clear what kind of experiment you talk about, RNA-seq? ChIP-seq? ATAC-seq? And what kind of replicates? Biological? Technical? What is the research question?

ADD REPLYlink written 2.6 years ago by Benn8.1k

Sorry about the extreme vagueness, I just edited my original post. This is ATAC-seq data that I'm pulling from GEO accessions, all uploaded by other labs. Most of these are biological replicates.

ADD REPLYlink written 2.6 years ago by phosphodiester_bond40
3
gravatar for Friederike
2.6 years ago by
Friederike6.7k
United States
Friederike6.7k wrote:

at which point would you recommend merging the reads from multiple replicates

When I'm sure that the replicates are sufficiently similar and I have decided that I don't need the information that might be gleaned from treating the replicates independently.

As b.nota's comment illustrates: there is not going to be a single answer to this. It depends on why you did the replicates in the first place, how much the replicates truly mimic each other, and what downstream analyses you're going to do.

ADD COMMENTlink written 2.6 years ago by Friederike6.7k

Thank you, I see now how this is probably not something with a generalizable solution (which is why I kept my original post intentionally vague).

ADD REPLYlink written 2.6 years ago by phosphodiester_bond40
1
gravatar for Friederike
2.6 years ago by
Friederike6.7k
United States
Friederike6.7k wrote:

I saw at Encode they first process the samples independently up to the BAM files, and merge all BAM files using samtools. What would you lose/gain by merging reads at the fastq level?

Maybe it's because you still want to know which sample failed the sequencing (if one or several did). That would be easier to detect if you aligned them separately since the alignment rate will be a good proxy for how well your sequencing went. Plus the additional QC for ATAC-seq such as mitochondrial contamination, fragment size distribution and genome coverage are easier done at the BAM level.

Or would you rather ignore all of this and merge the results at the very end of the pipeline?

If these are biological replicates (and from different labs! That's a guaranteed batch effect right there!), why would you want to merge them at all? I would probably try to keep the samples separate for the most part. If you are going to go for differential accessibility analysis, csaw and diffBind are both tools that could make use of replicates to gauge the biological (and technical!) variability.

ADD COMMENTlink written 2.6 years ago by Friederike6.7k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1604 users visited in the last hour
_