Integrate single-cell datasets (TPM and raw) to find gene markers between clusters
0
2
Entering edit mode
3.5 years ago
A. Domingues ★ 2.7k

Hi all,

I am trying to find cell markers to distinguish population A and B using single-cell RNA-seq data publicly available. The snag is that these populations where identified in different studies, and the data is available as raw counts (10x) for one study and TPMs (Smart-seq*) for another.

Any suggestion how to integrate these datasets to perform DE downstream?

I was considering using seurat and SCTransform. Any objections?

*I think. It is not clear from the paper's methods but they sequenced the library with 75PE reads.

single-cell SC Seurat • 1.9k views
ADD COMMENT
1
Entering edit mode

Any objections

Yes, sctransform fits its model on the UMI raw counts, not on TPM. You probably cannot do what you plan to do. If it is published can't you download the raw data and then process it? Yes, that is cumbersome but trying to tweak TPM and raw counts into one analysis is imho not only inappropriate but also a waste of time since results will not at all be reliable even if you technically get any results out of it. Alternatively, email the authors and ask for a raw count matrix. If you have that you could integrate them, but integration requires that at least some populations are being shared between studies so anchors (or whatever method you use) can be found. Random integration (like two completely different populations from different studies) is probably not going to be reliable.

ADD REPLY
0
Entering edit mode

Cheers @ATpoint. This is what I feared. Cheers. I will have to go back to the drawing board.

ADD REPLY
0
Entering edit mode

Just to add another note after doing some more research, Seurat doesn't recommend using SCTransform values for differential expression. So the sctransform is not even necessary for this.

ADD REPLY
1
Entering edit mode

Yes, that is true. The reason you run SCtransform in the integration context is to select features, it was not clear to me whether you want to integrate or not. DE would be typically done on the raw counts which then are being run through appropriate frameworks such as edgeR, but the problem with the batch effect stands.

ADD REPLY

Login before adding your answer.

Traffic: 1614 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6