Question: DE Analysis starting from TPM matrix? Also, no replicates.
0
gravatar for basch
2.0 years ago by
basch0
basch0 wrote:

I have a matrix of TPM values from 27 different tissue types that I obtained from a database (thus, I don't have the read counts). The data comes from an RNA-seq experiment.

I want to make differential expression analysis, where the purpose is to find a set of genes that is specific for one of those tissue types.

Is this possible? I've used DESeq2 before but that starts from read counts, not TPM values. Since I am working with pig, there are not many available databases from which I can extract these markers genes.

Thank you in advance,

rna-seq tpm • 1.4k views
ADD COMMENTlink modified 2.0 years ago by Michael Dondrup48k • written 2.0 years ago by basch0
4
gravatar for ATpoint
2.0 years ago by
ATpoint41k
Germany
ATpoint41k wrote:

Most tools expect raw counts as you mention. Without replicates, analysis will in any case be explorative but not statistically sound. You can take the log2 fold changes to get an idea what genes might be involved (probably a decent TPM cutoff to avoid high enrichment due to small counts, "mean-variance-relationship") makes sense. Still, any result will be unreliable so be careful to build downstream experiments on such analysis.

ADD COMMENTlink modified 2.0 years ago • written 2.0 years ago by ATpoint41k
2
gravatar for i.sudbery
2.0 years ago by
i.sudbery9.7k
Sheffield, UK
i.sudbery9.7k wrote:

Given the data you have, I don't see much chance of doing a DE analysis. DESeq and edgeR both require counts. It might be possible to do some thing using limma with TPMs, but without replicates, you are going to struggle to get anything meaningful.

Instead, I you'd be better not thinking aobut your problem as a differential expression problem.

There are various approaches to identifying tissue specific genes, in fact I believe I saw something on bioaxiv recently, but a simple approach might be an outlier anlaysis.

First normalise your data. An obvious approach might be be rlog or vst.

Then for each gene calculate the mean and standard deviation of all the tissues except your tissue of interest. Then calculate a Z score for the expression of the gene in the tissue of interest using this. Convert this to a p-value using the normal distribution. You'll need to do some corrections. Ideally you'd do some sort of empirical FDR, but I can't quite think how right now. You might get away just with doing a BH correction on the p-value.

ADD COMMENTlink written 2.0 years ago by i.sudbery9.7k
0
gravatar for Michael Dondrup
2.0 years ago by
Bergen, Norway
Michael Dondrup48k wrote:

Your first priority should be getting the raw data. You write that you got it from 'a database'. If the data extracted from that database is based on published data, then it should be possible to get the raw data also, and normally they will be replicated. E.g. when retrieving summarized tissue expression from Expression Atlas, there is always a link to the original datasets and publications.

ADD COMMENTlink written 2.0 years ago by Michael Dondrup48k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1304 users visited in the last hour