I am new to NGS technologies but have been producing and analysing some RNA-seq and ChIP-seq data recently with a view to integrating the two datasets. I have found these forums very useful in providing tips and advice so was hoping that someone might be able to help me with some issues I have experienced. Whilst the RNAseq analysis has gone very well, the ChIP-seq is proving a lot more problematic and I think it would be best to get a second opinion before I discard the data as being junk.
My experimental outline is as follows. Libraries were prepared for two biological replicates (IP and input control) and 75-mer paired end sequencing performed on an Illumina HiSeq 4000 platform. Reads were aligned to mm10 reference genome using BWA, and MACS-2 used to call peaks. From this I retrieved only a very small amount of peaks (~250) for each sample. My major concern is when I view the alignment files using IGV, the IP and input tracks are identical. I would have expected to find wide genomic coverage with a near flat baseline for my input, and more sparsely distributed distinct peaks for my IP samples. Instead, I have identical strong peaks for all 4 samples. The algorithms of MACS-2 do identify a significant enrichment of some of these peaks in the IP samples but I cannot regard these as true binding sites as there are matching peaks in the input controls. No other peaks were identified in my IP samples that could not be found in the input. I, therefore, have a couple of questions
1) What might be the cause of these specific sharp peaks in both input and IP samples? Areas of open chromatin?
2) Is it likely that the antibody used (custom-made) is non-specifically pulling down sonicated DNA? Or could the antibody not be precipitating any DNA at all and I have only sequenced input DNA non-specifically bound to the agarose beads for my IP samples?
3) Finally, I have noticed that only 48-49% of the total reads (for all input and IP samples) have been aligned by BWA. Could some of the true binding sites be hidden within these unmatched sequences and would it therefore, be worth going back to try and improve the alignment using less stringent mismatch criteria/another alignment tool? Or is this simply clutching at straws and the data is just junk?
Any suggestions would be kindly appreciated.