Allele-Specific Gene Expression In Ngs (454) Data
2
6
Entering edit mode
11.5 years ago

One of the projects we have in the lab is aimed at uncovering genes where gene expression is linked with SNP allel variants. For example, having a "T" instead of an "A" at position X in gene Y is linked with a 2 fold gene expression difference.

For this, I have about one million 454 sequences (around 300 bp) of cDNA and I plan to do the following (most is already done):

  • Assemble all sequences de novo into contigs (representing genes)
  • Save consensus sequences
  • Reassemble using the contigs as a reference
  • Detect SNPs (export SNP table)

Now for the tricky part, for which I would appreciate your suggestions. I need to statistically test for allel-specific gene expression across the 16 individually tagged fish. For that, I will use only those fish which are heterozygous.

The goal is to end up with a p-value that tells us that this gene show SNP allel-specific gene expression differences.

(NOTE: see added biological information in comment below)

Please tell me how you would proceed?

I added a bounty to this question. The accepted answer will give +100 reputation points to its author :)

next-gen sequencing snp allele gene • 3.9k views
ADD COMMENT
0
Entering edit mode

Can you add some info about genome architecture of your fish (ploidy, sex-related chromosomes, etc.)? If your fish is an haploid, low recomb, gene-determined species, the answer will be pretty straightforward.

ADD REPLY
0
Entering edit mode

Here are some more details. The fish is pseudo-diploid, with an event of duplication about 50k to 100k years ago. Sex chromosomes are unknown in most fish species, including this one. The samples come from 2 backcross strains with one of the ancestors having undergone an artificial selection program. We have 8 individuals per strain. Cheers.

ADD REPLY
0
Entering edit mode

@Eric Normandeau - found a paper that might interest you (see edit in my answer)

ADD REPLY
3
Entering edit mode
11.5 years ago
Phis ★ 1.1k

This may be a very naive suggestion, but maybe worth a try: You have a total of N transcripts per contig which in each individual you can decompose into (assuming diploids) N1 + N2 = N transcripts, where Nx is the number of transcripts for allele x (with 2 alleles for a diploid). If there is no allele-specific expression difference, you'd assume a 50:50 ratio or N1:N2, and you could test for deviations from that with, say a chi^2 or G-test that allow you to get a p-value.

The same could of course be extended if you have more individuals and more alleles, because you should be able to calculate the expected ratio of allele transcripts in every-case (provided you normalise, e.g. to 100% expression/contig per individual), allowing you to test for overall expected vs observed ratios as before.

--EDIT-- I just came across this paper (Fontanillas et al. 2010, Mol. Ecol.) which you might find interesting. (It is considerably less naive than what I proposed.)

ADD COMMENT
0
Entering edit mode

@PhiS Chi2 are indeed the first thing that comes to mind, but this indeed seems a bit too simple. The reason is that we have individually tagged the biological replicates in each groups. Grouping all the allele counts thus removes precious replication, which I would like to be at the basis of the statistical test. Thanks for your answer!

ADD REPLY
0
Entering edit mode

If your goal is only to detect which genes have ASE, then I tend to think that the simple way will work just fine. The individuals in your population give you the genetic diversity you need, but unless you are looking at epistatic effects, I don't think you need them for your analysis. Also, I have a strong inkling that a paper on exactly this topic will be in press in Genome Biology in the next month or so, fyi...

ADD REPLY
0
Entering edit mode

Thanks for the new paper @PhiS ! I'll take a look :)

ADD REPLY
0
Entering edit mode

Hehe, apparently, this is exactly the article I am currently basing my approach on ;) Thanks!

ADD REPLY
1
Entering edit mode
11.4 years ago

Have a look at this paper:

you may also have a look at the concept of Expression QTL markers and look at the literature on pubmed about how to derive them. eQTLs are basically markers used to study the level of gene expression.

ADD COMMENT

Login before adding your answer.

Traffic: 1950 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6