Question

SNP calling on ChIP-seq data

0

Entering edit mode

9.2 years ago

a11msp ▴ 120

I am trying to call SNPs from ChIP-seq data and currently, I'm using samtools mpileup for this. I am wondering however whether mpileup's SNP calling is affected by the assumption that we have individual-level genomic data and therefore all observed deviations in read count from the 1:1, or 0:1/1:0 ratios can only arise due to technical bias. Could anyone comment?

I know GATK has an RNA-seq SNP-calling module which should in theory account for this. Has anyone tried using it with ChIP-seq?

Many thanks!

SNP ChIP-Seq ngs • 5.3k views

ADD COMMENT • link updated 23 months ago by Ram 43k • written 9.2 years ago by a11msp ▴ 120

Ram · Answer 1 · 2015-03-12

5

Entering edit mode

9.1 years ago

inesdesantiago ▴ 260

Hi. I did try that at some point, and compared to published genotypes (from ENCODE). Works ok as long as you don't have cancer genomes (for this one needs to use specific snp calling pipelines to deal with copy-number aberrations and tumor/normal mixtures).

But for normal genomes, I think it works to some extent. I got the biggest error rate (5% or so) on Heterozygous calls that are miscalled as Homozygous due to allele-specific TF binding (only 1 allele is present in the ChIP-seq and GATK then calls that SNP homozygous, when in fact it was heterozygous).

Mostly I followed this paper which I think is really good:

"Ni et al 2012: Simultaneous SNP identification and assessment of allele-specific bias from ChIP-seq data"

http://www.biomedcentral.com/1471-2156/13/46

ADD COMMENT • link updated 23 months ago by Ram 43k • written 9.1 years ago by inesdesantiago ▴ 260

0

Entering edit mode

Thanks! I was inspired by this paper as well. Good point about allele-specific binding, which is why I focused only on those cases where the imbalance is in favour of the alternative allele.

ADD REPLY • link updated 23 months ago by Ram 43k • written 9.1 years ago by a11msp ▴ 120

Ram · Answer 2 · 2015-03-13

0

Entering edit mode

9.1 years ago

Friederike 8.9k

I didn't read the mentioned paper (and I'm not an expert on SNP calling), but I'd be interested to understand why you'd like to use ChIP-seq data for SNP calling (as opposed to exome-seq). Given the ChIP-seq data I've worked with, I'd be worried about lack of sequencing depth to detect SNPs with high confidence. Plus, ChIP-seq data is inherently biased (towards representing the enriched binding sites) which can be even more dramatically enhanced if you have a factor that binds GC-rich regions - is that something SNP calling algorithms can deal with?

ADD COMMENT • link 9.1 years ago by Friederike 8.9k

1

Entering edit mode

First answer: to detect non-coding SNPs at regions that exome-seq does not enrich for.

Second answer: to look for allelic imbalance in TF binding in the absence of per-haplotype reference genomes.

ADD REPLY • link updated 23 months ago by Ram 43k • written 9.1 years ago by a11msp ▴ 120

0

Entering edit mode

Again, given most of the ChIP-seq data I have seen, I would be extremely hesitant to call SNPs with the low coverage you generally achieve. In contrast to whole-genome-sequencing, ChIP-seq usually doesn't even aim for high, uniform coverage because what you're interested in are strong enrichments. If your antibody worked well and was targeted against a factor that binds to many sites in the genome, then a considerable amount of your reads should be within those peak regions, leaving even fewer reads for the remainder of the genome. That being said, I imagine that one might have many reads within those peak regions, but disentangling the presence of a SNP and allele-specific binding sounds like a daunting task. How are these concerns approached by the mentioned paper?

ADD REPLY • link updated 23 months ago by Ram 43k • written 9.1 years ago by Friederike 8.9k

0

Entering edit mode

This task is daunting, but doable. The coverage at ChIP peaks for good datasets is actually quite reasonable.

I suggest you have a look at that paper. What they propose is still less robust than aligning to two haplotype-specific reference genomes and quantifying allelic imbalance directly. However, in the absence of such genomes it may give some clues that can then be verified experimentally.

ADD REPLY • link updated 23 months ago by Ram 43k • written 9.1 years ago by a11msp ▴ 120

0

Entering edit mode

I'd be very interested to hear how many SNPs you eventually identified!
I agree that clues may probably be there, but I would never communicate to a biologist to expect that doing a ChIP-seq experiment will give him high-confidence identifications of SNPs.

ADD REPLY • link 9.1 years ago by Friederike 8.9k