SMARTseq2 scRNAseq and gene length normalization
1
1
Entering edit mode
2.1 years ago

Hi !

I'm wondering whether its is actually correct to compare the level of expression of genes in scRNAseq produced from SMARTseq2 full length protocol. When you are using UMIs (such as in 10x Genomics pipeline), you can truly estimate the number of mRNA molecules that account for each gene. But with SMARTseq2, you are only able to get reads (no UMIs) and thus, the probability to get a lot of reads for one given gene is higher for very long gene by comparison to very short genes... exactly such as in bulk RNAseq. So you should normalize by the gene length then.. However in several normalisation methods for scRNAseq (such as scran/scater) you don't normalize by the gene length.. Why is that? Does this preclude comparison of level of expression of genes in a given data set?

Thanks for your help !

Differential Normalization Expression SMARTseq2 • 988 views
ADD COMMENT
2
Entering edit mode
2.1 years ago
dsull ★ 5.8k

In theory, you should correct Smart-seq2 data by dividing the transcript counts by the effective length otherwise longer transcripts will get higher counts than smaller transcripts for the same amount of expression. UMIs (in theory) take care of this problem so in UMI-based data, this procedure should not be used.

Many scRNAseq tools are designed for 10X-type data.

The default method for running kallisto on smart-seq2 data normalizes by dividing each transcript by the effective length.

(Note: I say "in theory" because real data is messy and the "best" way to normalize hasn't been figured out yet -- it's still an active area of research. E.g. I've seen some UMI-based dataset exhibit length bias even though UMIs, in theory, are supposed to correct for that).

ADD COMMENT

Login before adding your answer.

Traffic: 2711 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6