Question

scRNA-seq, Seurat: correlation analysis of two replicates

0

Entering edit mode

4.2 years ago

Gregor Rot ▴ 540

Hello,

i have two scRNA-seq datasets (replicate 1, replicate 2).

I would like to compare how many genes were detected, average expression of genes, in brief: an estimate of how similar (reproducible) the replicates are, in terms of how well they correlate. Any hints, ideas, points where i would find code snippets of what is the best approach to this? (I am learning Seurat but happy to check out other software, like Scanpy)

Currently i am trying to normalize the data and plot average gene expression rep1 vs rep2.

Thanks for any help, Gregor

scRNA Seurat • 5.5k views

ADD COMMENT • link updated 4.1 years ago by Friederike 8.9k • written 4.2 years ago by Gregor Rot ▴ 540

score 2 · Answer 1 · 2020-03-28

I believe you're describing a couple of different features that you'd like to compare in addition to plain ol' correlation. The numbers of detected reads etc. can be assessed via in-built Seurat functions, such as shown in the basic processing vignette (also see ther reply to "Where are QC metrics stored in Seurat?").

I personally find it relatively tedious to interact with Seurat objects, which is why I would highly recommend to peruse the excellent guide to the scRNA-seq galaxy by the Bioconductor team. Their SingleCellExperiment object follows the well-established logic of the SummarizedExperiment object class, which makes it fairly straight-forward to extract the QC metrics one is interested in to make customized plots.

score 1 · Answer 2 · 2020-03-28

This doesn't exactly follow your appraoch to answers your question, however, you can check how similar the two replicates are by doing a merge analysis.

https://davetang.org/muse/2018/01/24/merging-two-10x-single-cell-datasets/

In this way, you are not performing any batch correction, just checking if two replicates are similar. In the example in the tutorial, both PBMC4k and 8k samples are very close, even they were (I am assuming) two different sequence experiments.

Even if there is some depth issue, or difference in number of cells, this will give you an idea if two replicates are actually replicates.

If your two samples suffers from batch effect, if they are cell lines there is less chance of that unless something wrong, you might not seem them overlapping.

score 0 · Answer 3 · 2020-03-28

Hi Gregor, apologies for the delay,

I do not believe there is anything built into Seurat for this. You could try an 'old friend' from the car package, though, i.e., scatterplotMatrix(). It performs a pairwise correlation / regression between all columns in your input data:

require(car)
scatterplotMatrix(x,
    regLine = list(method = lm, lty = 1, lwd = 2, col = "red2"),
    diagonal = "density",
    pch = '.',
    col = 'black',
    ellipse = TRUE, levels = c(0.5, 0.95), robust=TRUE)

In this example, based on the parameters that I have chosen, a linear regression is fit to the data (red line), with the lower and upper 5% confidence intervals (dashed black lines).

Warning: don't run this on a data-matrix of many samples with 1000s of genes - it will crash your computer.

Kevin