Question: DE Analysis starting from TPM matrix? Also, no replicates.
0
gravatar for basch
4 months ago by
basch0
basch0 wrote:

I have a matrix of TPM values from 27 different tissue types that I obtained from a database (thus, I don't have the read counts). The data comes from an RNA-seq experiment.

I want to make differential expression analysis, where the purpose is to find a set of genes that is specific for one of those tissue types.

Is this possible? I've used DESeq2 before but that starts from read counts, not TPM values. Since I am working with pig, there are not many available databases from which I can extract these markers genes.

Thank you in advance,

rna-seq tpm • 249 views
ADD COMMENTlink modified 4 months ago by Michael Dondrup45k • written 4 months ago by basch0
4
gravatar for ATpoint
4 months ago by
ATpoint14k
Germany
ATpoint14k wrote:

Most tools expect raw counts as you mention. Without replicates, analysis will in any case be explorative but not statistically sound. You can take the log2 fold changes to get an idea what genes might be involved (probably a decent TPM cutoff to avoid high enrichment due to small counts, "mean-variance-relationship") makes sense. Still, any result will be unreliable so be careful to build downstream experiments on such analysis.

ADD COMMENTlink modified 4 months ago • written 4 months ago by ATpoint14k
2
gravatar for i.sudbery
4 months ago by
i.sudbery4.1k
Sheffield, UK
i.sudbery4.1k wrote:

Given the data you have, I don't see much chance of doing a DE analysis. DESeq and edgeR both require counts. It might be possible to do some thing using limma with TPMs, but without replicates, you are going to struggle to get anything meaningful.

Instead, I you'd be better not thinking aobut your problem as a differential expression problem.

There are various approaches to identifying tissue specific genes, in fact I believe I saw something on bioaxiv recently, but a simple approach might be an outlier anlaysis.

First normalise your data. An obvious approach might be be rlog or vst.

Then for each gene calculate the mean and standard deviation of all the tissues except your tissue of interest. Then calculate a Z score for the expression of the gene in the tissue of interest using this. Convert this to a p-value using the normal distribution. You'll need to do some corrections. Ideally you'd do some sort of empirical FDR, but I can't quite think how right now. You might get away just with doing a BH correction on the p-value.

ADD COMMENTlink written 4 months ago by i.sudbery4.1k
0
gravatar for Michael Dondrup
4 months ago by
Bergen, Norway
Michael Dondrup45k wrote:

Your first priority should be getting the raw data. You write that you got it from 'a database'. If the data extracted from that database is based on published data, then it should be possible to get the raw data also, and normally they will be replicated. E.g. when retrieving summarized tissue expression from Expression Atlas, there is always a link to the original datasets and publications.

ADD COMMENTlink written 4 months ago by Michael Dondrup45k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1374 users visited in the last hour