Question: DE Analysis starting from TPM matrix? Also, no replicates.
gravatar for basch
15 months ago by
basch0 wrote:

I have a matrix of TPM values from 27 different tissue types that I obtained from a database (thus, I don't have the read counts). The data comes from an RNA-seq experiment.

I want to make differential expression analysis, where the purpose is to find a set of genes that is specific for one of those tissue types.

Is this possible? I've used DESeq2 before but that starts from read counts, not TPM values. Since I am working with pig, there are not many available databases from which I can extract these markers genes.

Thank you in advance,

rna-seq tpm • 781 views
ADD COMMENTlink modified 15 months ago by Michael Dondrup47k • written 15 months ago by basch0
gravatar for ATpoint
15 months ago by
ATpoint29k wrote:

Most tools expect raw counts as you mention. Without replicates, analysis will in any case be explorative but not statistically sound. You can take the log2 fold changes to get an idea what genes might be involved (probably a decent TPM cutoff to avoid high enrichment due to small counts, "mean-variance-relationship") makes sense. Still, any result will be unreliable so be careful to build downstream experiments on such analysis.

ADD COMMENTlink modified 15 months ago • written 15 months ago by ATpoint29k
gravatar for i.sudbery
15 months ago by
Sheffield, UK
i.sudbery6.9k wrote:

Given the data you have, I don't see much chance of doing a DE analysis. DESeq and edgeR both require counts. It might be possible to do some thing using limma with TPMs, but without replicates, you are going to struggle to get anything meaningful.

Instead, I you'd be better not thinking aobut your problem as a differential expression problem.

There are various approaches to identifying tissue specific genes, in fact I believe I saw something on bioaxiv recently, but a simple approach might be an outlier anlaysis.

First normalise your data. An obvious approach might be be rlog or vst.

Then for each gene calculate the mean and standard deviation of all the tissues except your tissue of interest. Then calculate a Z score for the expression of the gene in the tissue of interest using this. Convert this to a p-value using the normal distribution. You'll need to do some corrections. Ideally you'd do some sort of empirical FDR, but I can't quite think how right now. You might get away just with doing a BH correction on the p-value.

ADD COMMENTlink written 15 months ago by i.sudbery6.9k
gravatar for Michael Dondrup
15 months ago by
Bergen, Norway
Michael Dondrup47k wrote:

Your first priority should be getting the raw data. You write that you got it from 'a database'. If the data extracted from that database is based on published data, then it should be possible to get the raw data also, and normally they will be replicated. E.g. when retrieving summarized tissue expression from Expression Atlas, there is always a link to the original datasets and publications.

ADD COMMENTlink written 15 months ago by Michael Dondrup47k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1500 users visited in the last hour