Question

Weird 1kb-periodic artifact in ChIP-Seq data (including input)

1

Entering edit mode

7.5 years ago

Ryan Thompson ★ 3.6k

I am doing a ChIP-Seq analysis of this GEO data set. I was the data analyst on the original publication for that data set, which was done on hg19. I have now re-aligned the samples to hg38, and I'm using the csaw Bioconductor package to analyze the data. (Edit: These are single-end sequecing samples.) I've used the csaw functions findMaxima and profileSites to generate an average profile of the coverage around windows that represent local maxima in coverage, and noticed an odd pattern in the resulting profile plots:

https://www.dropbox.com/s/0fb7i7ata5d5dly/site-profile-plots.pdf?dl=0

You can view the code that I used to generate these plots here. This code mostly follows the csaw user's guide with regard to usage of findMaxima and profileSites.

In some, but not all, samples, including the input samples, there is a periodic spike in the coverage every 1000 bp. These spikes all appear roughly the same width as the main spike at 0. Since this is histone ChIP-Seq data, periodicity resulting from adjacent histones is not unexpected, but this does not explain a period of exactly 1000 bp. I'm not sure whether this artifact is the result of a software error or a problem with my data. The perfect regularity of the interval seems to suggest something software-related, but the inconsistency of the height of the spikes, and the fact that they are not present in all samples, suggest that the spikes are a property of the data.

When I generated the same plots based on the original alignment to hg19, I did not see these periodic spikes at all (see here). I did see some irregular spikes that were not consistent between samples, and most of these spikes disappeared when I filtered out reads overlapping the published excludable "blacklist" regions. These tracks are not available for hg38, so I used the liftOver tool to convert them to hg38 coordinates. I found that these were not sufficient after lifting over, so I supplemented them with my own "grey list" based on excessively high-coverage regions in the input samples from my data, using this code. The plots linked above were made after filtering out all reads in either the lifted-over blacklist or my own grey list.

So, does anyone have an explanation for this artifact, or has anyone encountered a similar result before?

ChIP-Seq • 2.2k views

ADD COMMENT • link 7.5 years ago by Ryan Thompson ★ 3.6k

1

Entering edit mode

Do you see this if you look at the raw data in IGV? My guess is that this is some weird edge-case bug in the code. If you write the maxima sites to a BED file then try computeMatrix followed by plotProfile from deepTools to see if a different tool shows the same thing.

ADD REPLY • link 7.5 years ago by Devon Ryan 104k

0

Entering edit mode

Are these pair-end reads?

ADD REPLY • link 7.5 years ago by Damian Kao 16k

1

Entering edit mode

These are single-end reads.

ADD REPLY • link 7.5 years ago by Ryan Thompson ★ 3.6k

1

Entering edit mode

Were the profiles generated strictly from the read mappings? Did you shift the reads according to mean fragment length or something?

ADD REPLY • link 7.5 years ago by Damian Kao 16k

0

Entering edit mode

The reads were counted as the 5-prime ends of 147-bp fragments (i.e. the length of 1 nucleosome worth of DNA).

ADD REPLY • link 7.5 years ago by Ryan Thompson ★ 3.6k

0

Entering edit mode

Having never used csaw, but looking at the findMaxima function, I see that there is a range parameter and it is set to "1000" for the call. It seems awfully conincidental that your peaks in the flanking regions are also spaced at 1Kb intervals... Have you tried varying the window and range parameters to see if the same artifact shows up but with different spacing?

As an aside, it is curious that your input has the same meta-profile as the IP's.

ADD REPLY • link 7.5 years ago by ejm32 ▴ 450

0

Entering edit mode

I'm running findMaxima with range=5000 and profileSites with range=10000.

ADD REPLY • link 7.5 years ago by Ryan Thompson ★ 3.6k