I'm trying to process the raw data from this CHIP-seq project to eventually perform a peak-calling analysis (basically reproduce the methodology described in the paper) and eventually a differential peak analysis to verify if there are differences in H3K9ac binding between the treatment and control groups for a particular gene. I'm confused, though, because some samples (like this one or even input samples) have two runs while others have only one (like most the H3K9me2 samples and one of the input samples). There's no description in the Methodology for why that is, at least not that I've seen. The metadata for each run is also not very clarifying. I don't believe they are supposed to be biological replicates since those are divided into different SRX accessions, properly named (e.g. the paper describes there are two replicates for both treatment and control for H3K9ac, and there are 4 different GEO samples for those, appropriately named). How can I tell if those multiple runs per SRX are technical replicates and not biological replicates? And if they are technical replicates, can I just cat
the FASTQ files prior to aligning with bwa
?