What is allele-specific binding, allele-specific expression, and allelic imbalance?
1
4
Entering edit mode
9.2 years ago
jgbradley1 ▴ 110

After a few days of searching, a clear explanation of these three concepts and how to study them is hard to find. I have a half-baked idea of these concepts but I feel I'm missing a lot. After seeing this post, it seems the word "allele" can mean different things depending on context making it harder for someone with a non-bio background to figure out. Here is what I've figured out so far:

What data is required to look at allele-specific binding and allele-specific expression?

ChIP-seq is used to look at protein binding sites, so allele-specific binding type of analyses are done with ChIP-seq data. RNA-seq is used to study gene expression, so allele-specific expression type of analyses are done with RNA-seq data.

How to study allele specific binding?

Suppose I have a reference genome and a chip-seq data set. After aligning the reads to the reference genome, I find all SNPs. In order to study allele-specific binding, this implies looking at only the heterozygous SNPs. How are allele-specific binding sites identified using a list of heterozygous SNPs?

Allelic Imbalance

Nathan Sheffield posted a good explanation of allelic imbalance here.

allele-specific-binding allele-specific-expression • 5.8k views
ADD COMMENT
2
Entering edit mode
9.2 years ago

Regarding allele-specific binding, you often first need to call variants with a non-ChIPseq dataset. You then have known heterozygous sites and can look for enrichment of ChIPed alignments covering those sites. This is the same for allele-specific expression (though there you're often really looking at haplotype-specific expression). The reason you don't want to call variants in the dataset that you're using to look for allele-specific something is that an allele-specific event will typically appear as a homozygous variant...leading you to ignore the site.

ADD COMMENT
0
Entering edit mode

I appreciate your response. That cleared some things up. I am using a dataset of high confident variant calls from the NIST Genome In a Bottle project as the source of my "known" heterozygous sites. I'm working with a small ChIP-seq dataset from here to start off with. To look for "enrichment of ChIPed alignments covering those sites", are you saying I need to count the number of aligned ChIPed reads that overlap a NIST heterozygous variant? This next step may be adding a layer of unnecessary complexity, but should I first identify binding sites from the data (using MACS for example) and then count reads that overlap a binding site and a NIST heterozygous variant?

ADD REPLY
0
Entering edit mode

Yes, you'll want to first call peaks with MACS or a similar tool and then only look at reads spanning heterozygous positions within those peaks. This will likely lead to a drastic decrease in the search space. You'll then count the number of reads spanning the variant with each for the genotypes.

ADD REPLY
0
Entering edit mode

Hi, Can you suggest a software tool to "count the number of reads". Thanks

ADD REPLY
0
Entering edit mode

Maybe BCFtools mpileup? See its documentation here.

ADD REPLY

Login before adding your answer.

Traffic: 2660 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6