I am by no means the expert here, but if the question is simply "what did Encode do?" and "what should I do?", I can probably take a shot at it :)
For Encode when they say they merged replicates to get a consolidated sample, they mean they merged the BAM files with
samtools merge or similar. From the data they produced, this is most likely a fine thing to do, because read-depth wasn't super-high back then and variance between individual ChIP/DNAse sequencing runs is significantly lower than RNA - particularly at the read numbers they were mapping at.
For RNA-Seq however, there is essentially no good reason I can think of to merge anything.
Regarding "what should I do", that's a much more interesting question :-) The reason is, there aren't many tools (that I know of) that make use of ChIP/DNAse replicate information. Most of the time, we end up merging everything together and treating all the reads the same. Of course we do the QC of the reads individually - looking at tracks individually in IGV or producing heat maps per-replicate - and in some scenarios we'll do Input/GC-bias correction at the run level and then merge at higher-level (signal bin counts), but only because we can't merge at the read level in those situations.
However, you never know what breakthrough is around the corner, so it would be very foolish of me to suggest replicate information isn't important. Fortunately, you can still have your cake and eat it too, by merging reads together into 1 BAM file, but tagging the reads with an RGID that is specific to their biological/technical replicate group. How useful this is depends on if software can understand the RGID field and do something useful with it. The only software I know that does is GATK, which will use the technical replicate information to model the quality of the sequencing when calling SNPs. Other than GATK though, I don't know of any software that does anything useful with replicate information - but maybe others can chime in with examples :)
3.7 years ago by
John ♦ 12k