Question

What statistical test to apply for DE after CibersortX deconvolution

0

Entering edit mode

12 months ago

Aspire ▴ 300

I am running CibersortX in high-resolution mode (which yields estimates of gene values per sample).

After that, I want to perform DE between two conditions on the resulting gene estimates.

What statistical test would one need to apply to perform DE after high-resolution mode de-convolution?
If I use the output of DE for ranking purposes only (in pre-ranked GSEA), does it make sense to use a simple t-test?

The data is in TPM (CibersortX received TPMs as its input). I transform to log2 scale before t-test. There are only 7 samples available (4 control, 3 treatment)

Here are some genes which get the smallest p-value after t-test (but do not pass FDR)

ETFBKMT 0.08727244  0.2549699   0.1277664   0.000000    0.597810    0.4984627   0.4672444
AFF3    3.28551359  2.9969676   2.8793547   3.013827    2.483018    2.5070328   2.3119873
PDLIM1  4.26118568  4.0109785   3.9223976   3.856899    3.388990    3.5155023   3.2931998
MYOF    6.17754486  5.9286025   5.7452668   5.845756    5.276884    5.3981318   5.1251924
LINC02210 3.22922278    3.1785117   3.5162067   3.307364    2.736807    2.9091521   2.8052582
CD274   5.20308488  5.0063801   4.8715090   4.922224    4.440247    4.6192961   4.5082217

And these are the values before log

ETFBKMT 1.062360    1.193311    1.092601    1.000000    1.513417    1.412707    1.382466
AFF3    9.750753    7.983202    7.358209    8.077041    5.590659    5.684498    4.965666
PDLIM1  19.175412   16.122220   15.162099   14.489126   10.475814   11.435934   9.802840
MYOF    72.381287   60.909803   53.641094   57.510614   38.770421   42.169610   34.900902
LINC02210 9.377626  9.053727    11.441519   9.899558    6.665934    7.511766    6.989834
CD274   36.837031   32.141830   29.273209   30.320557   21.709387   24.578008   22.756735

And that is the a distribution of the expression values ( before log2) after deconvolution, for one of the samples.

 Min.   :    1.00   
 1st Qu.:    5.01 
 Median :   18.03
 Mean   :   89.08 
 3rd Qu.:   47.82 
 Max.   :34406.87

cibersortx differential-expression t-test deconvolution • 1.5k views

ADD COMMENT • link 7 months ago by Aspire ▴ 300

1

Entering edit mode

What sort of data do you have right now? Please make a proper question and add an overview of the data (for example a head()) and in which format that is. Generally, once you have count data for RNA-seq with genes in rows and samples in columns you use standard tools such as DESeq2, limma-voom or edgeR for any sort of DE testing. t-tests do not help as they're both underpowered, do not respect overdispersion and you anyway need said tools for decent normalization.

Edit: I read on StackExchange where you cross-posted that one member there does not get tired promoting Wilcox tests for RNA-seq. Unless you have many samples that is never going to work due to massive lack of power, and again, you anyway need mentioned tools for normalization.

ADD REPLY • link 11 months ago by ATpoint 82k

0

Entering edit mode

Added - thanks.

ADD REPLY • link 11 months ago by Aspire ▴ 300

0

Entering edit mode

With 4 vs 3 you have no choice other than using methods such as the ones I mentioned to get reliable stats in the presence of limited replication/n. Wilcox will never yield anything significant at that sample size.

ADD REPLY • link 11 months ago by ATpoint 82k

0

Entering edit mode

The signature matrix I am using with CibersortX is in TPMs. Hence, the mixture file needs to be in TPMs also. I think this means the output will also be in TPMs. Which means I cannot use any of the standard tools...

ADD REPLY • link 7 months ago by Aspire ▴ 300

0

Entering edit mode

A key issue is the data set, and which species is under investigation. Agreed on T-test.

The key issue is the inability to assess the rate of false positivity resulting from a misfit between the data set and DSeq2 and EdgeR's assumed distributions.

A lot of results at published rate of 20% false positivity (minimum) is risky and there's no method of assessing of how close the data set is to the distribution: the tests needed to do this were never developed. This would have been quite easy to do. Post-2014 there was simply a shift from parametrics to non-parametric statistics.

If its humans (DSeq2 is right?) and you're happy with an unknown error of 20%, fine. If its not humans that error rate could be anything.

ADD REPLY • link 11 months ago by M__ ▴ 200