13 months ago by

United States

I have a feeling you're actually after the answer to a slightly different question, which could be along the lines of: "Why do we need a statistical package at all to process RNA-seq count data?"

As genomax and Devon have correctly pointed out: a t-test can be used in the realm of DE analysis, but you should absolutely, never ever apply it on the raw counts, no matter how many samples you have, because raw counts are never absolute measures of expression for a specific gene within a given sample. The actual number of reads per gene depends on the efficiency of the library prep including RNA extraction and cDNA synthesis and the amount of contamination from non-coding transcripts (e.g. rRNA, tRNA) and, of course, the actual sequencing depth, i.e. the number of reads per sample, also strongly influences the final value. All of these issues need to be taken into account **before any statistical test**, and this is where the packages have contributed a lot, too -- in addition to establishing ways of estimating variances from as little as 2-3 replicates per condition.

•

link
modified 13 months ago
•
written
13 months ago by
Friederike ♦ **5.6k**
My understanding from a presentation I saw (I am not a statistician) is that you could use a t-test IF you have a large number of samples (think tens or more). I recall the

`n`

being something like 20.83kBy "if I have large number of sample", is it because that gene expression follows non-normal distribution (although follows a similar one). Only with large number of samples would the expression statistics converge to normal distribution (by central limit theorem). Am I understand it correctly?

470Counts themselves will never approach a normal distribution, since they're integer and bounded at 0. They can be transformed to be "close enough", though, which is part of what voom() does.

95kEven though counts don't approach normal distribution, central limit theorem still allow me to use t-test on normalized counts if we have sufficient sample size (although most like we don't), right?

470Do you have any reference about this n > 20?

470Out of interest, is this pure academic interest or do you have data that do not behave as expected with standard tools and you try to tweak parameters now?

34kIt is pure academic interest. Want to get a rough picture of how DE is done.

470Have a look at the studies by Gierlinski et al., particularly this, this, and this

5.6k