One of the projects we have in the lab is aimed at uncovering genes where gene expression is linked with SNP allel variants. For example, having a "T" instead of an "A" at position X in gene Y is linked with a 2 fold gene expression difference.
For this, I have about one million 454 sequences (around 300 bp) of cDNA and I plan to do the following (most is already done):
- Assemble all sequences de novo into contigs (representing genes)
- Save consensus sequences
- Reassemble using the contigs as a reference
- Detect SNPs (export SNP table)
Now for the tricky part, for which I would appreciate your suggestions. I need to statistically test for allel-specific gene expression across the 16 individually tagged fish. For that, I will use only those fish which are heterozygous.
The goal is to end up with a p-value that tells us that this gene show SNP allel-specific gene expression differences.
(NOTE: see added biological information in comment below)
Please tell me how you would proceed?
I added a bounty to this question. The accepted answer will give +100 reputation points to its author :)