Question: Probability Of Expression Changes 2-, 5-, ...100-Fold
3
8.8 years ago by
Germany
Israel Barrantes780 wrote:

In RNA-seq and other gene expression approaches, usually you calculate the probability of obtaining a Y value (measured in sample B) from X (sample A), such in the case discussed by Audic and Claverie (Genome Res. 1997 Oct;7:986).

Now the case is the following: Having the read counts of two samples (X and Y, for each different transcript available), we would like to obtain a list of all transcript IDs, the true expression level of which is, with 95 % confidence, at least 5-fold different in the two samples. Which statistical test could help in this case?

Certainly, we would like to choose between 95% and 99% confidence intervals and betwees arbitrary cut-offs of x-fold expression, receiving e.g. a list of all transcripts that are 2-fold or 20-fold or 100-fold overexpressed at the choosen error probability p < given value

gene rna statistics • 3.0k views
modified 8.8 years ago by Marcin Cieslik520 • written 8.8 years ago by Israel Barrantes780

In common with the 2 answers so far, I don't understand the question. Could you add some additional information or consider re-wording it, because I'm not sure it's answerable in its current form.

Here it's the question, posed in a different way:

We have the read counts of two samples, X and Y, for each different transcript available.

The question is now the following: Give me a list of all transcript names, the true expression level of which is, with 95 % confidence, at least 5-fold different in the two samples.

Certainly, we would like to choose between 95% and 99% confidence intervals and betwees arbitrary cut-offs of x-fold expression, receiving e.g. a list of all transcripts that are 2-fold or 20-fold or 100-fold overexpressed at the choosen error probability p < given value

Ah, I see. This is something i have been asking myself for a long time, but I don't have a solution. I will keep watching this thread...

I edited the question accordingly.

1
8.8 years ago by
Lyco2.3k
Germany
Lyco2.3k wrote:

I am not entirely sure what your question is. You can calculate the probabiliby of finding 2x or 5x enrichment with the Aucid & Claverie statistics, but of course the probability depends on the actual count number, not only on the factor. There is an online server for performing the calculation and, according to their webpage, a 'unix version' of the program can be downloaded from http://www.igs.cnrs-mrs.fr/SpipInternet/spip.php?article168

1
8.8 years ago by
Chris Evelo10.0k
Maastricht, The Netherlands
Chris Evelo10.0k wrote:

I am not sure I completely understand the question. But the fold changes you find will really depend on what your samples are. If you for instance compare a knockout strain with a native strain fold changes will be very high (or infinity if you assume the knockout really gave expression zero). Same for null alleles. A hundred fold fold change would almost certainly be something like that. Copy number variations also tend to give high fold changes in expression.

On the other hand we often times search for effects of treatment in two samples that are otherwise as comparable as can be. E.g. the same individual before and after treatment. In nutritional interventions for instance we hardly ever find high fold changes. Two fold would already be very high. But what do you expect? you normally don't get blond hair all of a sudden from eating candies (although some of these might give you blue hair).

1
8.8 years ago by
Marcin Cieslik520 wrote:

(I write from memory as I do not have access to the paper, so this might not be accurate)

Having two read counts X and Y for a transcript and the total number of sequenced reads (A and B) the the poisson margin test (introduced here http://www.ncbi.nlm.nih.gov/pubmed/21385042) gives the probability of observing a count difference at least as high as D = Y - X, purely by chance with the rate of the poisson processes that generated X and Y the same (but unknown). In other words a low probability allows one to reject the hypothesis that there is no fold-change.

A different approach is to (somehow) estimate the rates of the generating processes and to calculate the p-value exactly (using the negative binomial: http://precedings.nature.com/documents/4282/version/1/files/npre20104282-1.pdf or negative binomial differential: http://smithlab.cmb.usc.edu/histone/rseg/rseg-supp.pdf)