Question: Integrate single-cell datasets (TPM and raw) to find gene markers between clusters
gravatar for A. Domingues
10 weeks ago by
A. Domingues2.4k
Dresden, Germany
A. Domingues2.4k wrote:

Hi all,

I am trying to find cell markers to distinguish population A and B using single-cell RNA-seq data publicly available. The snag is that these populations where identified in different studies, and the data is available as raw counts (10x) for one study and TPMs (Smart-seq*) for another.

Any suggestion how to integrate these datasets to perform DE downstream?

I was considering using seurat and SCTransform. Any objections?

*I think. It is not clear from the paper's methods but they sequenced the library with 75PE reads.

seurat sc single-cell • 201 views
ADD COMMENTlink modified 10 weeks ago by Biostar ♦♦ 20 • written 10 weeks ago by A. Domingues2.4k

Any objections

Yes, sctransform fits its model on the UMI raw counts, not on TPM. You probably cannot do what you plan to do. If it is published can't you download the raw data and then process it? Yes, that is cumbersome but trying to tweak TPM and raw counts into one analysis is imho not only inappropriate but also a waste of time since results will not at all be reliable even if you technically get any results out of it. Alternatively, email the authors and ask for a raw count matrix. If you have that you could integrate them, but integration requires that at least some populations are being shared between studies so anchors (or whatever method you use) can be found. Random integration (like two completely different populations from different studies) is probably not going to be reliable.

ADD REPLYlink modified 10 weeks ago • written 10 weeks ago by ATpoint44k

Cheers @ATpoint. This is what I feared. Cheers. I will have to go back to the drawing board.

ADD REPLYlink modified 10 weeks ago • written 10 weeks ago by A. Domingues2.4k

Just to add another note after doing some more research, Seurat doesn't recommend using SCTransform values for differential expression. So the sctransform is not even necessary for this.

ADD REPLYlink written 10 weeks ago by A. Domingues2.4k

Yes, that is true. The reason you run SCtransform in the integration context is to select features, it was not clear to me whether you want to integrate or not. DE would be typically done on the raw counts which then are being run through appropriate frameworks such as edgeR, but the problem with the batch effect stands.

ADD REPLYlink written 10 weeks ago by ATpoint44k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2108 users visited in the last hour