QQ-plot for microarray t-test?
2
0
Entering edit mode
2.6 years ago
sig93618 • 0

Hello,

we submitted a paper and we t-test and fold change for determining genes differentially expressed between two sample sets (the sets are not equal in numbers, 12 vs 8 in one case, 12 vs 10 in another). A referee is asking us for a qq-plot for the t-tests. I just do not understand what he is intending: the distribution between one set versus other one, or the distribution of genes in all samples versus normal distribution? And what is the simplest way to do it?

expression microarray t-test qq-plot • 1.2k views
0
Entering edit mode

Did you analyze microarray data with non-standard tools or even homemade statistics instead of something like limma?

0
Entering edit mode

It is a commercial software; I would not know if it can be called "non-standard".

7
Entering edit mode
2.6 years ago

The reviewer might suspect that the assumptions of the t-test are violated. A quantile-quantile-plot is a good way to compare two distributions, in this case, the theoretical distribution and the empirical distribution. Ideally, the two would be equal, resulting in a straight line. But often, empirical distributions tend to have wider tails, that is, more extreme values than expected are observed, resulting in a skewed Q-Q-plot. You were lucky though because the reviewer might have requested more advanced methods like limma or CyberT, but you might be fine with a t-test because you have a good number of samples.

Now, the question remains which distributions to compare. It could be debated whether the whole expression data should follow a single normal distribution, or if that should only apply to an individual transcript and its measurement error. For a t-test we assume that values for each transcripts are sampled from normal distributions with the same or different means. Because each single t-test 'sees' only the data from a single transcript, the latter should suffice, and one does not need to make the assumption about normality of all gene-expression values or their differences in total.

A t-test is made under the assumption that its T-statistic follows a Student-T distribution under the null-hypothesis. Therefore, instead of making a plot of all the expression data, I would make a Q-Q-plot of the test-statistics against a theoretical student-t distribution with the same degrees of freedom (depending on sample size).

This can be done easily with the functions qqplot and qt in R.

0
Entering edit mode
2.5 years ago
sig93618 • 0