Question: Best statistical test for genotype-specific gene expression analysis
19 months ago
I'd be very grateful for your advice. I have performed an RNA-Seq experiment for a particualr tissue in about 100 individuals. I have separately genotyped these individuals at the SNPs (from a GWAS) that I am interested in. I want to test whether the SNPs are functional by comparing genotype-specific gene expression of candidate genes at that locus.

Let's say for that I want to compare the expression levels of gene X in individuals with genotypes AA, AG, and GG. What is the best statistical test to perform here? The easy option is to do t-tests between AA vs AG, AG vs GG, and AA vs GG.

Alternatively, would you recommend doing a one-way ANOVA or Kruskal-Wallis? (The latter might be better because the group sizes between AA / AG / GG are very different, and so are the variances). Would it also be worth testing for recessive and dominant models? i.e. AA vs (AG + GG) and (AA + AG) vs GG? And what sort of corrections for multiple testing would you apply? Say if I do the three t-tests, would a p-value threshold of 0.05/3 be reasonable?

I'd love to hear your opinions on how best to approach this.

19 months ago
I would do a linear trend, summary(glm()) does ANOVA for the correlation. Envision a scatter plot, x is the number of G's, 0,1,2. y is your log(expression).

Statistical power is your problem, if you're looking at lots of genes or lots of SNPs, you will have false-positive results somewhere. If you have one or two genes and one or two SNPs, then any test you can justify will be fine.

For genome-wide or exploratory analyses look for more complete tools like

Dear Karl, That sounds very sensible - thank you. B

