What is allele-specific binding, allele-specific expression, and allelic imbalance?
1
4
Entering edit mode
6.9 years ago

After a few days of searching, a clear explanation of these three concepts and how to study them is hard to find. I have a half-baked idea of these concepts but I feel I'm missing a lot. After seeing this post (Allele Specific Events), it seems the word "allele" can mean different things depending on context making it harder for someone with a non-bio background to figure out. Here is what I've figured out so far:

What data is required to look at allele-specific binding and allele-specific expression?

ChIP-seq is used to look at protein binding sites, so allele-specific binding type of analyses are done with ChIP-seq data. RNA-seq is used to study gene expression, so allele-specific expression type of analyses are done with RNA-seq data.

How to study allele specific binding?

Suppose I have a reference genome and a chip-seq data set. After aligning the reads to the reference genome, I find all SNPs. In order to study allele-specific binding, this implies looking at only the heterozygous SNPs. How are allele-specific binding sites identified using a list of heterozygous SNPs?

Allelic Imbalance

Nathan Sheffield posted a good explanation of allelic imbalance here.

2
Entering edit mode
6.9 years ago

Regarding allele-specific binding, you often first need to call variants with a non-ChIPseq dataset. You then have known heterozygous sites and can look for enrichment of ChIPed alignments covering those sites. This is the same for allele-specific expression (though there you're often really looking at haplotype-specific expression). The reason you don't want to call variants in the dataset that you're using to look for allele-specific something is that an allele-specific event will typically appear as a homozygous variant...leading you to ignore the site.

0
Entering edit mode

I appreciate your response. That cleared some things up. I am using a dataset of high confident variant calls from the NIST Genome In a Bottle project as the source of my "known" heterozygous sites. I'm working with a small ChIP-seq dataset from here to start off with. To look for "enrichment of ChIPed alignments covering those sites", are you saying I need to count the number of aligned ChIPed reads that overlap a NIST heterozygous variant? This next step may be adding a layer of unnecessary complexity, but should I first identify binding sites from the data (using MACS for example) and then count reads that overlap a binding site and a NIST heterozygous variant?

0
Entering edit mode

Yes, you'll want to first call peaks with MACS or a similar tool and then only look at reads spanning heterozygous positions within those peaks. This will likely lead to a drastic decrease in the search space. You'll then count the number of reads spanning the variant with each for the genotypes.

0
Entering edit mode

Hi, Can you suggest a software tool to "count the number of reads". Thanks

0
Entering edit mode

Maybe BCFtools mpileup? See its documentation here.