Parametric Or Non-Parametric - That Is The Question?
1
4
Entering edit mode
12.5 years ago

Given measures of relative RNA abundance from microarray experiments, after preprocessing, normalising, etc., there always seems to be those genes that refuse to behave normally.

When conducting subsequent analysis on these genes, e.g., eQTL analysis, differential expression - do you go parametric or non-parametric?

What are your thoughts?

gene statistics microarray • 4.6k views
ADD COMMENT
1
Entering edit mode

I work using microarrays from 4 years and as far as I know most often researchers use anova, I do as well.

ADD REPLY
7
Entering edit mode
12.5 years ago

Much of the time I use a parametric test to establish an observed statistic, but a non-parametric test to establish a significance threshold for that test. This is a fairly common approch. For example, I think it's safe to say the vast majority of differential expression analysis is performed with some variation of the t test or linear regression. SAM, for example, uses a modified t test, and establishes a FDR through permutation testing.

This is a common approach for eQTL analysis as well; typically one tests candidate alleles using a linear model, but establishes significance by permutation testing. The non-parametric testing is particularly important for eQTL analysis, because in my experience eQTL results are particularly susceptible to outliers which hyper-inflate your statistic when looking for trans-eQTLs. This can happen when you have rare homozygous alleles that coincide with rare high or low expression values; by testing the whole genome, you inevitably identify these cases which are likely (though not certain) to be spurious associations.

When I perform genome-wide correlation analysis I take a different approach: I use spearman rank correlation rather than pearson correlation. In general, I've found the results to be of nearly the same power and I feel more comfortable with a non-parametric statistic in this case. For genome-wide analysis I also use a permutation method (the Genome-Wide Error Rate method, Churchill Genetics 1994) to establish a significance threshold.

ADD COMMENT

Login before adding your answer.

Traffic: 1682 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6