Question: Using FPKM and TPM values for batch correction for Single Cell RNA-Seq
0
gravatar for hkarakurt
6 months ago by
hkarakurt90
hkarakurt90 wrote:

Hello, We are trying to analyze a set of single cell data sets from different sources but we have a problem. One of the data set is in TPM and another one is in FPKM format. It is easy to do batch correction with raw counts (with CCA in Seurat or MNN in Scater) but we have no idea how to deal with this problem.

Do you think we can use TPM and FPKM values for batch correction since they are already normalized. Another option is to convert values to raw counts but we have no idea how to do it.

Thank you in advance.

ADD COMMENTlink modified 6 months ago by shoujun.gu290 • written 6 months ago by hkarakurt90
1
gravatar for ATpoint
6 months ago by
ATpoint31k
Germany
ATpoint31k wrote:

Both are not suited for differential (or any other inter-sample) comparison. Please use google and the search function. FPKM/TPM as normalization technique (and why it is a poor choice) has been discussed many times before. Also please check previous threads on batch correction. You always want identical in silico processing of data to avoid confounding effects and you always want data normalized together, not independelty.

ADD COMMENTlink modified 6 months ago • written 6 months ago by ATpoint31k

I am planning to check tximport package to convert FPKM and TPM values to counts. I found the function. I think you mean that one.

ADD REPLYlink written 6 months ago by hkarakurt90

No I did not. Sorry to say but what you say does not make any sense. tximport is meant to convert transcript abundance estimates to the gene level while correcting for the different lengths of the transcripts which influence the abundances (longer transcripts => higher abundances). As I said neither TPM nor FPKM are unsuited for intersample comparisons. Are these at least two datasets from the same study/lab or two completely different datasets?

ADD REPLYlink modified 6 months ago • written 6 months ago by ATpoint31k

They are completely different data sets from same biological sample

ADD REPLYlink written 6 months ago by hkarakurt90

Then it is probably impossible to do what you aim. See essentially: C: Comparison between scRNA and bulk RNA which should cover the main arguments regardless of the dataset being bulk or single-cell. Most importantly points 2 and 4.

ADD REPLYlink modified 6 months ago • written 6 months ago by ATpoint31k
1
gravatar for shoujun.gu
6 months ago by
shoujun.gu290
shoujun.gu290 wrote:

From my experience:

  1. Do not use FPKM/TPM at any situation.
  2. Do not use CCA, unless they are technical repeats (or something like this).
  3. If two datasets differs a lot, and you still believe there are common populations within these dataset, then MNN maybe helpful. But only use it for clustering.
ADD COMMENTlink written 6 months ago by shoujun.gu290

Thank you for your answer. My problem here is I do not have any raw count data. I am using public data sets and they provided only FPKM/TPM values. I was using MNN before but I have never been in a situation like this (data with only FPKM data)

ADD REPLYlink written 6 months ago by hkarakurt90

If its a published paper, the raw data should be deposited to SRA. But if it is from some database, you may not able to get the raw counts. Of course you can always use any algorithm on any type of data. People don't use FPKM for MNN just because they don't think this analysis will give you any reliable results.

ADD REPLYlink written 6 months ago by shoujun.gu290
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1263 users visited in the last hour