Question: SNP calling on ChIP-seq data
0
gravatar for a11msp
5.4 years ago by
a11msp110
European Union
a11msp110 wrote:

I am trying to call SNPs from ChIP-seq data and currently, I'm using samtools mpileup for this. I am wondering however whether mpileup's SNP calling is affected by the assumption that we have individual-level genomic data and therefore all observed deviations in read count from the 1:1, or 0:1/1:0 ratios can only arise due to technical bias. Could anyone comment?

I know GATK has an RNA-seq SNP-calling module which should in theory account for this. Has anyone tried using it with ChIP-seq?

Many thanks!

snp chip-seq ngs • 3.4k views
ADD COMMENTlink modified 5.4 years ago • written 5.4 years ago by a11msp110
3
gravatar for inesdesantiago
5.4 years ago by
United Kingdom
inesdesantiago170 wrote:

Hi. I did try that at some point, and compared to published genotypes (from ENCODE). Works ok as long as you don't have cancer genomes (for this one needs to use specific snp calling pipelines to deal with copy-number aberrations and tumor/normal mixtures). 

But for normal genomes, I think it works to some extent. I got the biggest error rate (5% or so)  on Heterozygous calls that are miscalled as Homozygous due to allele-specific TF binding (only 1 allele is present in the ChIP-seq and GATK then calls that SNP homozygous, when in fact it was heterozygous).

Mostly I followed this paper which I think is really good:

"Ni et al 2012: Simultaneous SNP identification and assessment of allele-specific bias from ChIP-seq data"

http://www.biomedcentral.com/1471-2156/13/46

 

ADD COMMENTlink written 5.4 years ago by inesdesantiago170

Thanks! I was inspired by this paper as well. Good point about allele-specific binding, which is why I focused only on those cases where the imbalance is in favour of the alternative allele. 

ADD REPLYlink written 5.4 years ago by a11msp110
0
gravatar for Friederike
5.4 years ago by
Friederike5.9k
United States
Friederike5.9k wrote:

I didn't read the mentioned paper (and I'm not an expert on SNP calling), but I'd be interested to understand why you'd like to use ChIP-seq data for SNP calling (as opposed to exome-seq). Given the ChIP-seq data I've worked with, I'd be worried about lack of sequencing depth to detect SNPs with high confidence. Plus, ChIP-seq data is inherently biased (towards representing the enriched binding sites) which can be even more dramatically enhanced if you have a factor that binds GC-rich regions - is that something SNP calling algorithms can deal with?

ADD COMMENTlink written 5.4 years ago by Friederike5.9k
1

First answer: to detect non-coding SNPs at regions that exome-seq does not enrich for.

Second answer: to look for allelic imbalance in TF binding in the absence of per-haplotype reference genomes.

 

ADD REPLYlink modified 5.4 years ago • written 5.4 years ago by a11msp110

Again, given most of the ChIP-seq data I have seen, I would be extremely hesitant to call SNPs with the low coverage you generally achieve. In contrast to whole-genome-sequencing, ChIP-seq usually doesn't even aim for high, uniform coverage because what you're interested in are strong enrichments. If your antibody worked well and was targeted against a factor that binds to many sites in the genome, then a considerable amount of your reads should be within those peak regions, leaving even fewer reads for the remainder of the genome. That being said, I imagine that one might have many reads within those peak regions, but disentangling the presence of a SNP and allele-specific binding sounds like a daunting task. How are these concerns approached by the mentioned paper?
 

ADD REPLYlink written 5.4 years ago by Friederike5.9k

This task is daunting, but doable. The coverage at ChIP peaks for good datasets is actually quite reasonable. 

 

I suggest you have a look at that paper. What they propose is still less robust than aligning to two haplotype-specific reference genomes and quantifying allelic imbalance directly. However, in the absence of such genomes it may give some clues that can then be verified experimentally.

 

ADD REPLYlink written 5.4 years ago by a11msp110

I'd be very interested to hear how many SNPs you eventually identified!
I agree that clues may probably be there, but I would never communicate to a biologist to expect that doing a ChIP-seq experiment will give him high-confidence identifications of SNPs.

ADD REPLYlink written 5.4 years ago by Friederike5.9k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1577 users visited in the last hour