Chip-Seq Peak Caller Able To Find Peaks Split By Indels?
4
2
Entering edit mode
10.9 years ago

Are there any chip-seq peak callers able to find cases where the region under a peak has been split by an indel between the reference and the chipped sample?

For example, let's suppose one has done chip-seq on individual 1. Let's suppose we were to resequence individual 1, assemble the genome, and map the chip-seq reads straight to its assembled genome, and there would a peak like this:

         ***
*****
**********
**************
********************
-----------------------


Now the reference genome has an insertion with respect to the chipped sample, so when one maps chip-seq reads from individual 1 to the reference genome assembly, the peak looks like this:

         *                    **
**                    ***
****                    ******
******                    ********
*********                    ***********
----------====================-------------


Is there any peak caller able to identify this as a single peak, even if the indel is in the range of hundreds of bps?

chip-seq peak-calling indel structural • 2.3k views
1
Entering edit mode

Do you expect a large portion of your peaks to have this problem or is this a question of a single region of interest?

1
Entering edit mode

Are you using a mapper which identifies/maps reads across these indels?

1
Entering edit mode

Peak callers generally treat reads as tags. They know nothing about indels in the reads themselves. I might be thinking about this wrong, but for a regulatory region disrupted (biologically) by an indel, I would think that the most likely result that you would see in your data is a lack of reads (no peak).

0
Entering edit mode

@Aaron Statham: I would like to use one that does that, even if it is after the mapping process.

0
Entering edit mode

@Aaron Statham: I would like to use a peak caller that identifies reads overlapping across indels, even if it is after a mapping process that doesn't take that into account.

0
Entering edit mode

@Aaron Statham: I would like to use a peak caller that identifies reads overlapping across indels, even if it is after a mapping process that doesn't take that into account. @Sean Davis: any of events is allegedly interesting to study, since the regulatory region may have been disrupted by the indel, but is present in other individuals.

0
Entering edit mode

@Aaron Statham: I would like to use a peak caller that identifies reads overlapping across indels, even if it is after a mapping process that doesn't take that into account.
@Sean Davis: any of events is allegedly interesting to study, since the regulatory region may have been disrupted by the indel, but is present in other individuals.

1
Entering edit mode
10.9 years ago

We created our own simplistic peak caller to be able to join neighbouring peaks together as a single region of interest. See my answer in this thread. Depending on the gap size in your individuals and whether you are interested in differentiating nearby peaks, a similar approach might work

0
Entering edit mode

Do you take into account the sequence overlaps left and right to the indel?

0
Entering edit mode

Not specifically. We just defined a peak which is above a defined background level, between a min and max length and merge peaks within a certain range of each other.

1
Entering edit mode
10.9 years ago
Aaron Statham ★ 1.1k

Unless your mapper aligns reads across the indel (similarly to RNA splicing), I dont know how you would differentiate a deletion in your sample versus two separate peaks... A couple of ideas off the top of my head

• Take all genomic regions where two (or more?) peaks are very close together - then use these regions as a reference genome upon which you remap your unaligned reads against with a more sensitive aligner, BLAT maybe?
• I would think that the distribution of + and - strand reads at such indel peaks would look different to a normal peak, perhaps something could be exploited there (only if the indel is larger than the ChIP fragment size I guess)

If you are serious about finding these kind of events (this is a rather esoteric problem), I would create a simulated set of data and see what strategy performs best when you absolutely know there an indel (ROC curves and the like).

1
Entering edit mode
10.9 years ago

For small indels such as those picked up by aligners, I don't think that most chip-seq softwares will even provide a split peak (because each read is treated as a simple tag). I think that indels large enough to create two peaks where there was only one are probably relatively uncommon (but indel variation, in general, is a significant source of variation), at least in humans. But, if you are really interested in finding these things as an academic exercise, I would think that the best bet is to use paired-end sequencing for your chip-seq data generation. Your chip-seq peak-finding can then take advantage of both ends of the fragments (not sure which software for chip-seq can use paired-ends, but it wouldn't be hard to patch something together, I don't think). You can also apply typical structural variant software and ideas to find putative regions of structural variation. Layer your peak calls on your structural variation findings and you have your answers.

0
Entering edit mode
10.9 years ago
Malcolm.Cook ★ 1.3k

just thinking...

Use your favorite tools to find the indels or other structural variations between the reference genome and the subject genome, given your reads

use this result to edit the reference genome into a putative subject genome

remap the the reads onto this subject genome

and use your favorite indel agnostic peak caller on the subject genome