Question: Allele-Specific Gene Expression In Ngs (454) Data
gravatar for Eric Normandeau
10.2 years ago by
Quebec, Canada
Eric Normandeau10k wrote:

One of the projects we have in the lab is aimed at uncovering genes where gene expression is linked with SNP allel variants. For example, having a "T" instead of an "A" at position X in gene Y is linked with a 2 fold gene expression difference.

For this, I have about one million 454 sequences (around 300 bp) of cDNA and I plan to do the following (most is already done):

  • Assemble all sequences de novo into contigs (representing genes)
  • Save consensus sequences
  • Reassemble using the contigs as a reference
  • Detect SNPs (export SNP table)

Now for the tricky part, for which I would appreciate your suggestions. I need to statistically test for allel-specific gene expression across the 16 individually tagged fish. For that, I will use only those fish which are heterozygous.

The goal is to end up with a p-value that tells us that this gene show SNP allel-specific gene expression differences.

(NOTE: see added biological information in comment below)

Please tell me how you would proceed?

I added a bounty to this question. The accepted answer will give +100 reputation points to its author :)

gene next-gen allele snp sequencing • 3.5k views
ADD COMMENTlink modified 11 months ago by Biostar ♦♦ 20 • written 10.2 years ago by Eric Normandeau10k

Can you add some info about genome architecture of your fish (ploidy, sex-related chromosomes, etc.)? If your fish is an haploid, low recomb, gene-determined species, the answer will be pretty straightforward.

ADD REPLYlink written 10.2 years ago by Jarretinha3.3k

Here are some more details. The fish is pseudo-diploid, with an event of duplication about 50k to 100k years ago. Sex chromosomes are unknown in most fish species, including this one. The samples come from 2 backcross strains with one of the ancestors having undergone an artificial selection program. We have 8 individuals per strain. Cheers.

ADD REPLYlink written 10.2 years ago by Eric Normandeau10k

@Eric Normandeau - found a paper that might interest you (see edit in my answer)

ADD REPLYlink written 10.1 years ago by Phis1.0k
gravatar for Phis
10.2 years ago by
Phis1.0k wrote:

This may be a very naive suggestion, but maybe worth a try: You have a total of N transcripts per contig which in each individual you can decompose into (assuming diploids) N1 + N2 = N transcripts, where Nx is the number of transcripts for allele x (with 2 alleles for a diploid). If there is no allele-specific expression difference, you'd assume a 50:50 ratio or N1:N2, and you could test for deviations from that with, say a chi^2 or G-test that allow you to get a p-value.

The same could of course be extended if you have more individuals and more alleles, because you should be able to calculate the expected ratio of allele transcripts in every-case (provided you normalise, e.g. to 100% expression/contig per individual), allowing you to test for overall expected vs observed ratios as before.

--EDIT-- I just came across this paper (Fontanillas et al. 2010, Mol. Ecol.) which you might find interesting. (It is considerably less naive than what I proposed.)

ADD COMMENTlink modified 10.1 years ago • written 10.2 years ago by Phis1.0k

@PhiS Chi2 are indeed the first thing that comes to mind, but this indeed seems a bit too simple. The reason is that we have individually tagged the biological replicates in each groups. Grouping all the allele counts thus removes precious replication, which I would like to be at the basis of the statistical test. Thanks for your answer!

ADD REPLYlink written 10.2 years ago by Eric Normandeau10k

If your goal is only to detect which genes have ASE, then I tend to think that the simple way will work just fine. The individuals in your population give you the genetic diversity you need, but unless you are looking at epistatic effects, I don't think you need them for your analysis. Also, I have a strong inkling that a paper on exactly this topic will be in press in Genome Biology in the next month or so, fyi...

ADD REPLYlink written 10.2 years ago by Andrew Su4.8k

Thanks for the new paper @PhiS ! I'll take a look :)

ADD REPLYlink written 10.1 years ago by Eric Normandeau10k

Hehe, apparently, this is exactly the article I am currently basing my approach on ;) Thanks!

ADD REPLYlink written 10.1 years ago by Eric Normandeau10k
gravatar for Giovanni M Dall'Olio
10.2 years ago by
London, UK
Giovanni M Dall'Olio27k wrote:

Have a look at this paper:

you may also have a look at the concept of Expression QTL markers and look at the literature on pubmed about how to derive them. eQTLs are basically markers used to study the level of gene expression.

ADD COMMENTlink written 10.2 years ago by Giovanni M Dall'Olio27k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1553 users visited in the last hour