Question: Using FPKM and TPM values for batch correction for Single Cell RNA-Seq
0
gravatar for hkarakurt
10 months ago by
hkarakurt100
hkarakurt100 wrote:

Hello, We are trying to analyze a set of single cell data sets from different sources but we have a problem. One of the data set is in TPM and another one is in FPKM format. It is easy to do batch correction with raw counts (with CCA in Seurat or MNN in Scater) but we have no idea how to deal with this problem.

Do you think we can use TPM and FPKM values for batch correction since they are already normalized. Another option is to convert values to raw counts but we have no idea how to do it.

Thank you in advance.

ADD COMMENTlink modified 10 months ago by shoujun.gu310 • written 10 months ago by hkarakurt100
1
gravatar for ATpoint
10 months ago by
ATpoint36k
Germany
ATpoint36k wrote:

Both are not suited for differential (or any other inter-sample) comparison. Please use google and the search function. FPKM/TPM as normalization technique (and why it is a poor choice) has been discussed many times before. Also please check previous threads on batch correction. You always want identical in silico processing of data to avoid confounding effects and you always want data normalized together, not independelty.

ADD COMMENTlink modified 10 months ago • written 10 months ago by ATpoint36k

I am planning to check tximport package to convert FPKM and TPM values to counts. I found the function. I think you mean that one.

ADD REPLYlink written 10 months ago by hkarakurt100

No I did not. Sorry to say but what you say does not make any sense. tximport is meant to convert transcript abundance estimates to the gene level while correcting for the different lengths of the transcripts which influence the abundances (longer transcripts => higher abundances). As I said neither TPM nor FPKM are unsuited for intersample comparisons. Are these at least two datasets from the same study/lab or two completely different datasets?

ADD REPLYlink modified 10 months ago • written 10 months ago by ATpoint36k

They are completely different data sets from same biological sample

ADD REPLYlink written 10 months ago by hkarakurt100

Then it is probably impossible to do what you aim. See essentially: C: Comparison between scRNA and bulk RNA which should cover the main arguments regardless of the dataset being bulk or single-cell. Most importantly points 2 and 4.

ADD REPLYlink modified 10 months ago • written 10 months ago by ATpoint36k
1
gravatar for shoujun.gu
10 months ago by
shoujun.gu310
shoujun.gu310 wrote:

From my experience:

  1. Do not use FPKM/TPM at any situation.
  2. Do not use CCA, unless they are technical repeats (or something like this).
  3. If two datasets differs a lot, and you still believe there are common populations within these dataset, then MNN maybe helpful. But only use it for clustering.
ADD COMMENTlink written 10 months ago by shoujun.gu310

Thank you for your answer. My problem here is I do not have any raw count data. I am using public data sets and they provided only FPKM/TPM values. I was using MNN before but I have never been in a situation like this (data with only FPKM data)

ADD REPLYlink written 10 months ago by hkarakurt100

If its a published paper, the raw data should be deposited to SRA. But if it is from some database, you may not able to get the raw counts. Of course you can always use any algorithm on any type of data. People don't use FPKM for MNN just because they don't think this analysis will give you any reliable results.

ADD REPLYlink written 10 months ago by shoujun.gu310
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1067 users visited in the last hour