Question: the analysis of multiple samples of 10X scRNA-seq
0
gravatar for Bogdan
4 months ago by
Bogdan950
Palo Alto, CA, USA
Bogdan950 wrote:

Dear all, greetings

i'd like to ask you for a piece of advise please : we have 3 scRNA-seq samples that were sequenced at different depths (200 mil reads, or 800 mil reads, 900 mil reads), and consequently, we do see :

-- distinct numbers of cells, and

-- (on average) distinct number of genes/cell, depending on the sample

would the integration of these samples with CELLRANGER AGGR be a good approach (it does normalize the samples too), followed by standard analysis of the AGGREGATED SAMPLES with SEURAT, or SimpleSingleCell pipeline ?

thank you very much,

-- bogdan

scrnaseq scrna-seq • 688 views
ADD COMMENTlink modified 4 months ago by geek_y10k • written 4 months ago by Bogdan950
5
gravatar for geek_y
4 months ago by
geek_y10k
Barcelona
geek_y10k wrote:

I would:

Option 1:

Independently quantify genes of each sample. --> Normalize to 10,000 reads per cell (Default in most scRNA analysis) / ScTransform --> Transform the matrix to square root (instead of log2(counts+1) --> Merge the three matrices (cbind)--> Remove genes that are lowly expressed in less than 1 or 2 or 5% of the cells --> Use combat to remove batch effects (here three batches) --> import the matrix to Seurat --> Skip Normalisation --> PCA/UMAPClustering etc... I am pretty sure the cells will be clustered by cell-types rather than samples.

If you want to see gene expression changes across clusters, I would introduce an extra step of imputation. So It would be:

Independently quantify genes of each sample. --> Normalize to 10,000 reads per cell (Default in most scRNA analysis) --> Transform the matrix to square root (instead of log2(counts+1) / ScTransform--> Merge the three matrices --> Remove genes that are lowly expressed in less than 1 or 2 or 5% of the cells --> Use combat to remove batch effects (here three batches) --> Impute gene expression (For example MAGIC) --> import the matrix to Seurat --> Skip Normalisation --> PCA/UMAPClustering etc...

Take average of gene expression for each cluster and calculate a cluster specificity score (Tau Score for example) and them take genes with Tau score more than 0.5 or 0.3 and Perform K-means clustering of averaged gene expression across clusters to pick markers.

Option 2: Use Seurat (v3) CCA analysis to integrate datasets. Straightforward. It performs SCTransform instead of Library size normalisation, which seems to be better for scRNA data but it depends on end goal.

Option 3: Or if you want to use seurat default differential analysis, start with raw counts but use the knn graph from above analysis and proceed with typical marker analysis or differential gene expression analysis.

Its all custom analysis but works pretty well and its fun.

ADD COMMENTlink modified 4 months ago • written 4 months ago by geek_y10k
1

Why do people still in 2019 do normalization purely by counts / sequencing depth. Check any benchmarking paper on that issue, they all show that naive by-count methods perform poorly. This is not sufficient to correct for library composition changes, both in bulk and scRNA-seq. Check sophisticated single-cell normalization approaches such SCnorm.

ADD REPLYlink modified 4 months ago • written 4 months ago by ATpoint29k

Yes I agree. But it seq-depth normalisation is not terrible and performs decently and depends on the end goal. Here It helps to get the cell-type specific clusters.

Option 2, Seurat CCA integration does perform SCTransform instead of sequencing depth normalisation. I updated my answer to highlight that point.

ADD REPLYlink modified 4 months ago • written 4 months ago by geek_y10k

seq-depth normalisation is not terrible and performs decently

Sorry, but I disagree. Show me one benchmark paper that demonstrates this beyond anecdotal evidence. By best knowlege (and I would be happy if you prove me wrong) but there is none, be it bulk or sc.

ADD REPLYlink written 4 months ago by ATpoint29k

You can add your own answer what you think is the best for the question. I said it doesn't perform terrible. I did not say its the best.

Here is a plot which I generated using total count normalisation and the cells cluster by cell-types. Ofcourse, normalisation methods developed especially for scRNA might perform best and gives very illuminating insights.

enter image description here

And see https://www.sciencedirect.com/science/article/pii/S2405471219300808#fig3

ADD REPLYlink modified 4 months ago • written 4 months ago by geek_y10k

Please take no offense. I am not trying to call you out, but simply aim to offer different opinions on suggested answers.

ADD REPLYlink written 4 months ago by ATpoint29k

thanks a lot for the very detailed suggestions !

ADD REPLYlink written 4 months ago by Bogdan950
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 876 users visited in the last hour