I am doing a ChIP-Seq analysis of this GEO data set. I was the data analyst on the original publication for that data set, which was done on hg19. I have now re-aligned the samples to hg38, and I'm using the csaw Bioconductor package to analyze the data. (Edit: These are single-end sequecing samples.) I've used the csaw functions
profileSites to generate an average profile of the coverage around windows that represent local maxima in coverage, and noticed an odd pattern in the resulting profile plots:
You can view the code that I used to generate these plots here. This code mostly follows the csaw user's guide with regard to usage of
In some, but not all, samples, including the input samples, there is a periodic spike in the coverage every 1000 bp. These spikes all appear roughly the same width as the main spike at 0. Since this is histone ChIP-Seq data, periodicity resulting from adjacent histones is not unexpected, but this does not explain a period of exactly 1000 bp. I'm not sure whether this artifact is the result of a software error or a problem with my data. The perfect regularity of the interval seems to suggest something software-related, but the inconsistency of the height of the spikes, and the fact that they are not present in all samples, suggest that the spikes are a property of the data.
When I generated the same plots based on the original alignment to hg19, I did not see these periodic spikes at all (see here). I did see some irregular spikes that were not consistent between samples, and most of these spikes disappeared when I filtered out reads overlapping the published excludable "blacklist" regions. These tracks are not available for hg38, so I used the liftOver tool to convert them to hg38 coordinates. I found that these were not sufficient after lifting over, so I supplemented them with my own "grey list" based on excessively high-coverage regions in the input samples from my data, using this code. The plots linked above were made after filtering out all reads in either the lifted-over blacklist or my own grey list.
So, does anyone have an explanation for this artifact, or has anyone encountered a similar result before?